2025-09-07T06:38:45.1519296Z Current runner version: '2.328.0' 2025-09-07T06:38:45.1522195Z Runner name: 'linux.rocm.gpu.gfx942.1-xb8kr-runner-hql9s' 2025-09-07T06:38:45.1522558Z Runner group name: 'default' 2025-09-07T06:38:45.1522948Z Machine name: 'linux' 2025-09-07T06:38:45.1523988Z ##[group]GITHUB_TOKEN Permissions 2025-09-07T06:38:45.1525034Z Contents: read 2025-09-07T06:38:45.1525250Z Metadata: read 2025-09-07T06:38:45.1525457Z ##[endgroup] 2025-09-07T06:38:45.1526644Z Secret source: Actions 2025-09-07T06:38:45.1526917Z Prepare workflow directory 2025-09-07T06:38:45.1797730Z Prepare all required actions 2025-09-07T06:38:45.1821539Z Getting action download info 2025-09-07T06:38:45.4915404Z Download action repository 'pytorch/pytorch@main' (SHA:93fb23d6fae7c4e82c4239a1033e522088742634) 2025-09-07T06:38:47.8770473Z Download action repository 'aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722' (SHA:ececac1a45f3b08a01d2dd070d28d111c5fe6722) 2025-09-07T06:38:48.2088087Z Download action repository 'aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076' (SHA:062b18b96a7aff071d4dc91bc00c4c1a7945b076) 2025-09-07T06:38:48.5074756Z Download action repository 'pytorch/test-infra@main' (SHA:548a4bc624d43a01cdf165a63b041f0ae014ddbd) 2025-09-07T06:38:49.0399023Z Download action repository 'actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-09-07T06:38:49.5344487Z Getting action download info 2025-09-07T06:38:49.6666918Z Download action repository 'actions/checkout@v4' (SHA:08eba0b27e820071cde6df949e0beb9ba4906955) 2025-09-07T06:38:49.9804084Z Getting action download info 2025-09-07T06:38:50.1211427Z Download action repository 'nick-fields/retry@v3.0.0' (SHA:7152eba30c6575329ac0576536151aca5a72780e) 2025-09-07T06:38:50.4781001Z Getting action download info 2025-09-07T06:38:50.6159738Z Uses: pytorch/pytorch/.github/workflows/_rocm-test.yml@refs/heads/main (93fb23d6fae7c4e82c4239a1033e522088742634) 2025-09-07T06:38:50.6161872Z ##[group] Inputs 2025-09-07T06:38:50.6162057Z build-environment: linux-noble-rocm-py3.12-mi300 2025-09-07T06:38:50.6162700Z test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}]} 2025-09-07T06:38:50.6163491Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:38:50.6163773Z sync-tag: 2025-09-07T06:38:50.6164213Z timeout-minutes: 300 2025-09-07T06:38:50.6164314Z tests-to-include: 2025-09-07T06:38:50.6164415Z dashboard-tag: 2025-09-07T06:38:50.6164649Z disable-monitor: true 2025-09-07T06:38:50.6164756Z monitor-log-interval: 5 2025-09-07T06:38:50.6164870Z monitor-data-collect-interval: 1 2025-09-07T06:38:50.6164992Z ##[endgroup] 2025-09-07T06:38:50.6165169Z Complete job name: linux-noble-rocm-py3.12-mi300 / test (default, 6, 6, linux.rocm.gpu.gfx942.1) 2025-09-07T06:38:50.6589407Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@main 2025-09-07T06:38:50.6589652Z with: 2025-09-07T06:38:50.6589733Z no-sudo: true 2025-09-07T06:38:50.6589819Z submodules: recursive 2025-09-07T06:38:50.6589908Z fetch-depth: 0 2025-09-07T06:38:50.6590031Z env: 2025-09-07T06:38:50.6590124Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:38:50.6590246Z ##[endgroup] 2025-09-07T06:38:50.6638319Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T06:38:50.6638675Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T06:38:50.6647923Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:38:50.6648068Z env: 2025-09-07T06:38:50.6648151Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:38:50.6648243Z ##[endgroup] 2025-09-07T06:38:50.6786728Z ##[group]Run # Use all available CPUs for fetching 2025-09-07T06:38:50.6786962Z # Use all available CPUs for fetching 2025-09-07T06:38:50.6787099Z cd "${GITHUB_WORKSPACE}" 2025-09-07T06:38:50.6787241Z git config --global fetch.parallel 0 2025-09-07T06:38:50.6787384Z git config --global submodule.fetchJobs 0 2025-09-07T06:38:50.6787612Z  2025-09-07T06:38:50.6787749Z # Clean workspace. The default checkout action should also do this, but 2025-09-07T06:38:50.6787922Z # do it here as well just in case 2025-09-07T06:38:50.6788049Z if [[ -d .git ]]; then 2025-09-07T06:38:50.6788162Z  if [ -z "${NO_SUDO}" ]; then 2025-09-07T06:38:50.6788302Z  sudo git clean -ffdx 2025-09-07T06:38:50.6788409Z  else 2025-09-07T06:38:50.6788506Z  git clean -ffdx 2025-09-07T06:38:50.6788608Z  fi 2025-09-07T06:38:50.6788695Z fi 2025-09-07T06:38:50.6794272Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:38:50.6794417Z env: 2025-09-07T06:38:50.6794508Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:38:50.6794603Z NO_SUDO: true 2025-09-07T06:38:50.6794696Z ##[endgroup] 2025-09-07T06:38:50.7076210Z ##[group]Run actions/checkout@v4 2025-09-07T06:38:50.7076352Z with: 2025-09-07T06:38:50.7076459Z ref: 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:38:50.7076699Z fetch-depth: 0 2025-09-07T06:38:50.7076794Z submodules: recursive 2025-09-07T06:38:50.7076891Z show-progress: false 2025-09-07T06:38:50.7076996Z repository: pytorch/pytorch 2025-09-07T06:38:50.7080247Z token: *** 2025-09-07T06:38:50.7080342Z ssh-strict: true 2025-09-07T06:38:50.7080442Z ssh-user: git 2025-09-07T06:38:50.7080529Z persist-credentials: true 2025-09-07T06:38:50.7080627Z clean: true 2025-09-07T06:38:50.7080716Z sparse-checkout-cone-mode: true 2025-09-07T06:38:50.7080823Z fetch-tags: false 2025-09-07T06:38:50.7080905Z lfs: false 2025-09-07T06:38:50.7080987Z set-safe-directory: true 2025-09-07T06:38:50.7081083Z env: 2025-09-07T06:38:50.7081161Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:38:50.7081255Z ##[endgroup] 2025-09-07T06:38:50.7621157Z Syncing repository: pytorch/pytorch 2025-09-07T06:38:50.7621744Z ##[group]Getting Git version info 2025-09-07T06:38:50.7621911Z Working directory is '/home/runner/_work/pytorch/pytorch' 2025-09-07T06:38:50.7622146Z [command]/usr/bin/git version 2025-09-07T06:38:50.7622248Z git version 2.51.0 2025-09-07T06:38:50.7632490Z ##[endgroup] 2025-09-07T06:38:50.7638379Z Copying '/home/runner/.gitconfig' to '/home/runner/_work/_temp/ef701089-0376-4825-9255-7fa187edefba/.gitconfig' 2025-09-07T06:38:50.7844895Z Temporarily overriding HOME='/home/runner/_work/_temp/ef701089-0376-4825-9255-7fa187edefba' before making global git config changes 2025-09-07T06:38:50.7845283Z Adding repository directory to the temporary git global config as a safe directory 2025-09-07T06:38:50.7854065Z [command]/usr/bin/git config --global --add safe.directory /home/runner/_work/pytorch/pytorch 2025-09-07T06:38:50.7874042Z Deleting the contents of '/home/runner/_work/pytorch/pytorch' 2025-09-07T06:38:50.7876280Z ##[group]Initializing the repository 2025-09-07T06:38:50.7879022Z [command]/usr/bin/git init /home/runner/_work/pytorch/pytorch 2025-09-07T06:38:50.8149189Z hint: Using 'master' as the name for the initial branch. This default branch name 2025-09-07T06:38:50.8149407Z hint: is subject to change. To configure the initial branch name to use in all 2025-09-07T06:38:50.8149610Z hint: of your new repositories, which will suppress this warning, call: 2025-09-07T06:38:50.8149777Z hint: 2025-09-07T06:38:50.8149899Z hint: git config --global init.defaultBranch 2025-09-07T06:38:50.8150206Z hint: 2025-09-07T06:38:50.8150348Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and 2025-09-07T06:38:50.8150546Z hint: 'development'. The just-created branch can be renamed via this command: 2025-09-07T06:38:50.8150690Z hint: 2025-09-07T06:38:50.8150773Z hint: git branch -m 2025-09-07T06:38:50.8150873Z hint: 2025-09-07T06:38:50.8151001Z hint: Disable this message with "git config set advice.defaultBranchName false" 2025-09-07T06:38:50.8151208Z Initialized empty Git repository in /home/runner/_work/pytorch/pytorch/.git/ 2025-09-07T06:38:50.8155560Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch 2025-09-07T06:38:50.8430148Z ##[endgroup] 2025-09-07T06:38:50.8430325Z ##[group]Disabling automatic garbage collection 2025-09-07T06:38:50.8433505Z [command]/usr/bin/git config --local gc.auto 0 2025-09-07T06:38:50.8451242Z ##[endgroup] 2025-09-07T06:38:50.8451600Z ##[group]Setting up auth 2025-09-07T06:38:50.8452897Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-09-07T06:38:50.8474644Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-09-07T06:38:50.8627507Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-09-07T06:38:50.8644473Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-09-07T06:38:50.8813791Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-09-07T06:38:50.8922058Z ##[endgroup] 2025-09-07T06:38:50.8922518Z ##[group]Fetching the repository 2025-09-07T06:38:50.8930824Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2025-09-07T06:39:17.5246654Z From https://github.com/pytorch/pytorch 2025-09-07T06:39:17.5247586Z * [new branch] 160583 -> origin/160583 2025-09-07T06:39:17.5248167Z * [new branch] 2.6.0.dev20241004+ -> origin/2.6.0.dev20241004+ 2025-09-07T06:39:17.5270165Z * [new branch] 5addvllmbuild -> origin/5addvllmbuild 2025-09-07T06:39:17.5270462Z * [new branch] AaronWang04_addmmfusion_perftest -> origin/AaronWang04_addmmfusion_perftest 2025-09-07T06:39:17.5270784Z * [new branch] HDCharles-2.6.0-release-notes -> origin/HDCharles-2.6.0-release-notes 2025-09-07T06:39:17.5271038Z * [new branch] ISSUE-154849 -> origin/ISSUE-154849 2025-09-07T06:39:17.5271514Z * [new branch] JackCaoG/dynamo_make_fx_non_core_aten_ops -> origin/JackCaoG/dynamo_make_fx_non_core_aten_ops 2025-09-07T06:39:17.5271792Z * [new branch] NicoshevSVE128 -> origin/NicoshevSVE128 2025-09-07T06:39:17.5272042Z * [new branch] PR-AOTInductorNoneBug -> origin/PR-AOTInductorNoneBug 2025-09-07T06:39:17.5272306Z * [new branch] PR-AOTInductorNoneBugFix -> origin/PR-AOTInductorNoneBugFix 2025-09-07T06:39:17.5272921Z * [new branch] PR-FixConfigsIssue -> origin/PR-FixConfigsIssue 2025-09-07T06:39:17.5273174Z * [new branch] PR-NoneBugFix-viable -> origin/PR-NoneBugFix-viable 2025-09-07T06:39:17.5273401Z * [new branch] PR-ResetToZero -> origin/PR-ResetToZero 2025-09-07T06:39:17.5273659Z * [new branch] Update-Flash-Packaging -> origin/Update-Flash-Packaging 2025-09-07T06:39:17.5273921Z * [new branch] VLA_exp -> origin/VLA_exp 2025-09-07T06:39:17.5274168Z * [new branch] actually-run-mps-aot-inductor -> origin/actually-run-mps-aot-inductor 2025-09-07T06:39:17.5274581Z * [new branch] add-missing-args-normalization -> origin/add-missing-args-normalization 2025-09-07T06:39:17.5274850Z * [new branch] add-user-guide-structure -> origin/add-user-guide-structure 2025-09-07T06:39:17.5275063Z * [new branch] add-vllm-nightly-build -> origin/add-vllm-nightly-build 2025-09-07T06:39:17.5275264Z * [new branch] add_compile_benchmarking -> origin/add_compile_benchmarking 2025-09-07T06:39:17.5275469Z * [new branch] addmm-heuristic -> origin/addmm-heuristic 2025-09-07T06:39:17.5275645Z * [new branch] addsimde -> origin/addsimde 2025-09-07T06:39:17.5275808Z * [new branch] addvllmtest -> origin/addvllmtest 2025-09-07T06:39:17.5275992Z * [new branch] adi/acl_upgrade -> origin/adi/acl_upgrade 2025-09-07T06:39:17.5276320Z * [new branch] adi/test -> origin/adi/test 2025-09-07T06:39:17.5276588Z * [new branch] adi/test_bgemm -> origin/adi/test_bgemm 2025-09-07T06:39:17.5276773Z * [new branch] adi/test_fusions -> origin/adi/test_fusions 2025-09-07T06:39:17.5277109Z * [new branch] adi/test_onednn_v3.9 -> origin/adi/test_onednn_v3.9 2025-09-07T06:39:17.5277405Z * [new branch] adi/test_presve_change -> origin/adi/test_presve_change 2025-09-07T06:39:17.5277651Z * [new branch] adi/test_timm -> origin/adi/test_timm 2025-09-07T06:39:17.5277832Z * [new branch] adi/testpresve_change -> origin/adi/testpresve_change 2025-09-07T06:39:17.5278219Z * [new branch] aditew01/test/vec_bf16 -> origin/aditew01/test/vec_bf16 2025-09-07T06:39:17.5278423Z * [new branch] ah-globalfeedback-hook -> origin/ah-globalfeedback-hook 2025-09-07T06:39:17.5278611Z * [new branch] alt-disable -> origin/alt-disable 2025-09-07T06:39:17.5278854Z * [new branch] angelayi/aoti_additional_files -> origin/angelayi/aoti_additional_files 2025-09-07T06:39:17.5279097Z * [new branch] angelayi/aoti_inductor_fx -> origin/angelayi/aoti_inductor_fx 2025-09-07T06:39:17.5279305Z * [new branch] angelayi/benchmark -> origin/angelayi/benchmark 2025-09-07T06:39:17.5279505Z * [new branch] angelayi/benchmark2 -> origin/angelayi/benchmark2 2025-09-07T06:39:17.5279754Z * [new branch] angelayi/change_pytree_serialization -> origin/angelayi/change_pytree_serialization 2025-09-07T06:39:17.5279988Z * [new branch] angelayi/cpp_loader -> origin/angelayi/cpp_loader 2025-09-07T06:39:17.5280195Z * [new branch] angelayi/custom_op_subgraph -> origin/angelayi/custom_op_subgraph 2025-09-07T06:39:17.5280406Z * [new branch] angelayi/customop -> origin/angelayi/customop 2025-09-07T06:39:17.5280605Z * [new branch] angelayi/fake_cache_empty -> origin/angelayi/fake_cache_empty 2025-09-07T06:39:17.5280820Z * [new branch] angelayi/is_symbolic_tracing -> origin/angelayi/is_symbolic_tracing 2025-09-07T06:39:17.5281016Z * [new branch] angelayi/item -> origin/angelayi/item 2025-09-07T06:39:17.5281260Z * [new branch] angelayi/no_so_weight -> origin/angelayi/no_so_weight 2025-09-07T06:39:17.5281456Z * [new branch] angelayi/opoverload -> origin/angelayi/opoverload 2025-09-07T06:39:17.5281651Z * [new branch] angelayi/pattern -> origin/angelayi/pattern 2025-09-07T06:39:17.5281831Z * [new branch] angelayi/pytree -> origin/angelayi/pytree 2025-09-07T06:39:17.5282016Z * [new branch] angelayi/scan_layers -> origin/angelayi/scan_layers 2025-09-07T06:39:17.5282207Z * [new branch] angelayi/symint_input -> origin/angelayi/symint_input 2025-09-07T06:39:17.5282435Z * [new branch] angelayi/test_cpp -> origin/angelayi/test_cpp 2025-09-07T06:39:17.5282619Z * [new branch] angelayi/torch_size -> origin/angelayi/torch_size 2025-09-07T06:39:17.5282806Z * [new branch] aoti-cuda-alloc -> origin/aoti-cuda-alloc 2025-09-07T06:39:17.5282992Z * [new branch] aoti_target_windows -> origin/aoti_target_windows 2025-09-07T06:39:17.5283172Z * [new branch] aoti_weight_sharing -> origin/aoti_weight_sharing 2025-09-07T06:39:17.5283381Z * [new branch] atalman-inductor-perf-cu124 -> origin/atalman-inductor-perf-cu124 2025-09-07T06:39:17.5283616Z * [new branch] atalman-inductor-perf-cu124.1 -> origin/atalman-inductor-perf-cu124.1 2025-09-07T06:39:17.5283833Z * [new branch] atalman-patch-1 -> origin/atalman-patch-1 2025-09-07T06:39:17.5284014Z * [new branch] atalman-patch-3 -> origin/atalman-patch-3 2025-09-07T06:39:17.5284191Z * [new branch] atalman-patch-4 -> origin/atalman-patch-4 2025-09-07T06:39:17.5284363Z * [new branch] atalman-patch-5 -> origin/atalman-patch-5 2025-09-07T06:39:17.5284535Z * [new branch] atalman-patch-6 -> origin/atalman-patch-6 2025-09-07T06:39:17.5284727Z * [new branch] atalman_inductor_2.3.0 -> origin/atalman_inductor_2.3.0 2025-09-07T06:39:17.5284999Z * [new branch] atalman_inductor_2.3.1 -> origin/atalman_inductor_2.3.1 2025-09-07T06:39:17.5285215Z * [new branch] atalman_inductor_2.4.0 -> origin/atalman_inductor_2.4.0 2025-09-07T06:39:17.5285411Z * [new branch] atalman_inductor_2.4.x -> origin/atalman_inductor_2.4.x 2025-09-07T06:39:17.5285655Z * [new branch] autoupdate-transformers-pin-via-pr -> origin/autoupdate-transformers-pin-via-pr 2025-09-07T06:39:17.5285893Z * [new branch] bahuang/dtensor_demo -> origin/bahuang/dtensor_demo 2025-09-07T06:39:17.5286081Z * [new branch] bahuang/test -> origin/bahuang/test 2025-09-07T06:39:17.5286246Z * [new branch] base/1.5 -> origin/base/1.5 2025-09-07T06:39:17.5286452Z * [new branch] batching_sdpa_efficient_attention -> origin/batching_sdpa_efficient_attention 2025-09-07T06:39:17.5286778Z * [new branch] bc-lint-config -> origin/bc-lint-config 2025-09-07T06:39:17.5286969Z * [new branch] bc-lint-test-new-config -> origin/bc-lint-test-new-config 2025-09-07T06:39:17.5287173Z * [new branch] benchmark-updates -> origin/benchmark-updates 2025-09-07T06:39:17.5287390Z * [new branch] benchmarker_compat_with_do_bench -> origin/benchmarker_compat_with_do_bench 2025-09-07T06:39:17.5287615Z * [new branch] benchmarking-script -> origin/benchmarking-script 2025-09-07T06:39:17.5287815Z * [new branch] bertmaher/pinbump26 -> origin/bertmaher/pinbump26 2025-09-07T06:39:17.5288002Z * [new branch] bertrand/cutlass -> origin/bertrand/cutlass 2025-09-07T06:39:17.5288195Z * [new branch] bf/cg-custom-wrapper -> origin/bf/cg-custom-wrapper 2025-09-07T06:39:17.5288430Z * [new branch] bf/cg-or-error -> origin/bf/cg-or-error 2025-09-07T06:39:17.5288607Z * [new branch] bf/cg-remove-check -> origin/bf/cg-remove-check 2025-09-07T06:39:17.5288792Z * [new branch] bf/cg-skip-1-kernel -> origin/bf/cg-skip-1-kernel 2025-09-07T06:39:17.5288972Z * [new branch] bf/cudagraph -> origin/bf/cudagraph 2025-09-07T06:39:17.5289202Z * [new branch] bf/cudagraph-disable-input-mutation -> origin/bf/cudagraph-disable-input-mutation 2025-09-07T06:39:17.5289549Z * [new branch] bf/cudagraph-enable-input-mutation-support-benchmark -> origin/bf/cudagraph-enable-input-mutation-support-benchmark 2025-09-07T06:39:17.5289898Z * [new branch] bf/cudagraph-partition -> origin/bf/cudagraph-partition 2025-09-07T06:39:17.5290115Z * [new branch] bf/default-recompile-reason -> origin/bf/default-recompile-reason 2025-09-07T06:39:17.5290337Z * [new branch] bf/donated-buffer-bench -> origin/bf/donated-buffer-bench 2025-09-07T06:39:17.5290525Z * [new branch] bf/exp -> origin/bf/exp 2025-09-07T06:39:17.5290705Z * [new branch] bf/pa-non-divisible -> origin/bf/pa-non-divisible 2025-09-07T06:39:17.5290904Z * [new branch] bf/partition-move-cpu -> origin/bf/partition-move-cpu 2025-09-07T06:39:17.5291104Z * [new branch] bf/partition-turn-on -> origin/bf/partition-turn-on 2025-09-07T06:39:17.5291314Z * [new branch] bf/remove-check-55b0c39d -> origin/bf/remove-check-55b0c39d 2025-09-07T06:39:17.5291506Z * [new branch] bf/rope -> origin/bf/rope 2025-09-07T06:39:17.5291695Z * [new branch] bisect_perf_hf_T5_3acc6eac492 -> origin/bisect_perf_hf_T5_3acc6eac492 2025-09-07T06:39:17.5291912Z * [new branch] bisect_perf_hf_T5_3fcf66f61fb -> origin/bisect_perf_hf_T5_3fcf66f61fb 2025-09-07T06:39:17.5292122Z * [new branch] bisect_perf_hf_T5_4009d154129 -> origin/bisect_perf_hf_T5_4009d154129 2025-09-07T06:39:17.5292334Z * [new branch] bisect_perf_hf_T5_40d0740e73d -> origin/bisect_perf_hf_T5_40d0740e73d 2025-09-07T06:39:17.5292544Z * [new branch] bisect_perf_hf_T5_5268754e -> origin/bisect_perf_hf_T5_5268754e 2025-09-07T06:39:17.5292748Z * [new branch] bisect_perf_hf_T5_7d89a8d385c -> origin/bisect_perf_hf_T5_7d89a8d385c 2025-09-07T06:39:17.5292958Z * [new branch] bisect_perf_hf_T5_b7a25c1ee7c -> origin/bisect_perf_hf_T5_b7a25c1ee7c 2025-09-07T06:39:17.5293165Z * [new branch] bisect_perf_hf_T5_c25b201583f -> origin/bisect_perf_hf_T5_c25b201583f 2025-09-07T06:39:17.5293379Z * [new branch] bisect_perf_hf_T5_c93e57efac0 -> origin/bisect_perf_hf_T5_c93e57efac0 2025-09-07T06:39:17.5293588Z * [new branch] bisect_perf_hf_T5_ca9813ea149 -> origin/bisect_perf_hf_T5_ca9813ea149 2025-09-07T06:39:17.5293793Z * [new branch] bisect_perf_hf_T5_d65f194a -> origin/bisect_perf_hf_T5_d65f194a 2025-09-07T06:39:17.5293995Z * [new branch] bisect_perf_hf_T5_da94ab0b -> origin/bisect_perf_hf_T5_da94ab0b 2025-09-07T06:39:17.5294204Z * [new branch] bisect_perf_hf_T5_da94ab0b_new -> origin/bisect_perf_hf_T5_da94ab0b_new 2025-09-07T06:39:17.5294419Z * [new branch] bisect_perf_hf_T5_db4e8a1d8a8 -> origin/bisect_perf_hf_T5_db4e8a1d8a8 2025-09-07T06:39:17.5294631Z * [new branch] bisect_perf_hf_T5_e0d97e936a2 -> origin/bisect_perf_hf_T5_e0d97e936a2 2025-09-07T06:39:17.5294835Z * [new branch] bisect_perf_hf_T5_f23621ec563 -> origin/bisect_perf_hf_T5_f23621ec563 2025-09-07T06:39:17.5295046Z * [new branch] bowbao/bench_updates_stage -> origin/bowbao/bench_updates_stage 2025-09-07T06:39:17.5295247Z * [new branch] bowbao/dort_rewriter -> origin/bowbao/dort_rewriter 2025-09-07T06:39:17.5295468Z * [new branch] bowbao/wip_prs -> origin/bowbao/wip_prs 2025-09-07T06:39:17.5295661Z * [new branch] brister/break_tensorbox -> origin/brister/break_tensorbox 2025-09-07T06:39:17.5295868Z * [new branch] brister/custom_fx_backend -> origin/brister/custom_fx_backend 2025-09-07T06:39:17.5296073Z * [new branch] brister/fx_custom_triton -> origin/brister/fx_custom_triton 2025-09-07T06:39:17.5296278Z * [new branch] brister/tensor_box_output -> origin/brister/tensor_box_output 2025-09-07T06:39:17.5296618Z * [new branch] brister/tiled_reduction_no_numel_check -> origin/brister/tiled_reduction_no_numel_check 2025-09-07T06:39:17.5296890Z * [new branch] c57382a49 -> origin/c57382a49 2025-09-07T06:39:17.5297054Z * [new branch] ca_0431d47eaa -> origin/ca_0431d47eaa 2025-09-07T06:39:17.5297232Z * [new branch] ca_fix_0431d47eaa -> origin/ca_fix_0431d47eaa 2025-09-07T06:39:17.5297582Z * [new branch] camyll/revert-94bc900da97ad7f3c35b3b819bb53b23c74b581a-for-release-2.8 -> origin/camyll/revert-94bc900da97ad7f3c35b3b819bb53b23c74b581a-for-release-2.8 2025-09-07T06:39:17.5297950Z * [new branch] camyllh/test_setup_hooks_push -> origin/camyllh/test_setup_hooks_push 2025-09-07T06:39:17.5298216Z * [new branch] cherry-pick-149654-by-pytorch_bot_bot_ -> origin/cherry-pick-149654-by-pytorch_bot_bot_ 2025-09-07T06:39:17.5298494Z * [new branch] cherry-pick-151939-by-pytorch_bot_bot_ -> origin/cherry-pick-151939-by-pytorch_bot_bot_ 2025-09-07T06:39:17.5298765Z * [new branch] cherry-pick-154174-by-pytorch_bot_bot_ -> origin/cherry-pick-154174-by-pytorch_bot_bot_ 2025-09-07T06:39:17.5299039Z * [new branch] cherry-pick-156260-by-pytorch_bot_bot_ -> origin/cherry-pick-156260-by-pytorch_bot_bot_ 2025-09-07T06:39:17.5299315Z * [new branch] cherry-pick-157453-by-pytorch_bot_bot_ -> origin/cherry-pick-157453-by-pytorch_bot_bot_ 2025-09-07T06:39:17.5299587Z * [new branch] cherry-pick-157513-by-pytorch_bot_bot_ -> origin/cherry-pick-157513-by-pytorch_bot_bot_ 2025-09-07T06:39:17.5299854Z * [new branch] cherry-pick-157695-by-pytorch_bot_bot_ -> origin/cherry-pick-157695-by-pytorch_bot_bot_ 2025-09-07T06:39:17.5300126Z * [new branch] cherry-pick-157732-by-pytorch_bot_bot_ -> origin/cherry-pick-157732-by-pytorch_bot_bot_ 2025-09-07T06:39:17.5300394Z * [new branch] cherry-pick-158537-by-pytorch_bot_bot_ -> origin/cherry-pick-158537-by-pytorch_bot_bot_ 2025-09-07T06:39:17.5300665Z * [new branch] cherry-pick-159969-by-pytorch_bot_bot_ -> origin/cherry-pick-159969-by-pytorch_bot_bot_ 2025-09-07T06:39:17.5300936Z * [new branch] cherry-pick-160586-by-pytorch_bot_bot_ -> origin/cherry-pick-160586-by-pytorch_bot_bot_ 2025-09-07T06:39:17.5301163Z * [new branch] chilli/flex_vllm -> origin/chilli/flex_vllm 2025-09-07T06:39:17.5301396Z * [new branch] cleanup-inductor-benchmark-images -> origin/cleanup-inductor-benchmark-images 2025-09-07T06:39:17.5301621Z * [new branch] codex-testing -> origin/codex-testing 2025-09-07T06:39:17.5301864Z * [new branch] codex/add-helper-function-to-sizevars.py -> origin/codex/add-helper-function-to-sizevars.py 2025-09-07T06:39:17.5302201Z * [new branch] codex/add-helper-function-to-sizevars.py_2025-09-05 -> origin/codex/add-helper-function-to-sizevars.py_2025-09-05 2025-09-07T06:39:17.5302526Z * [new branch] codex/add-metadata-field-for-file-path -> origin/codex/add-metadata-field-for-file-path 2025-09-07T06:39:17.5302865Z * [new branch] codex/add-test-for-inductor-local-cache-behavior -> origin/codex/add-test-for-inductor-local-cache-behavior 2025-09-07T06:39:17.5303290Z * [new branch] codex/create-test-for-tensor-memory-leak-in-cudagraph -> origin/codex/create-test-for-tensor-memory-leak-in-cudagraph 2025-09-07T06:39:17.5303615Z * [new branch] codex/fix-issue-121219-in-pytorch -> origin/codex/fix-issue-121219-in-pytorch 2025-09-07T06:39:17.5309261Z * [new branch] codex/fix-issue-160415-in-pytorch -> origin/codex/fix-issue-160415-in-pytorch 2025-09-07T06:39:17.5309651Z * [new branch] codex/fix-noqengine-quantized-engine-support -> origin/codex/fix-noqengine-quantized-engine-support 2025-09-07T06:39:17.5309959Z * [new branch] codex/fix-pin_memory-error-handling -> origin/codex/fix-pin_memory-error-handling 2025-09-07T06:39:17.5310228Z * [new branch] codex/propose-fix-for-issue-160332 -> origin/codex/propose-fix-for-issue-160332 2025-09-07T06:39:17.5310613Z * [new branch] codex/refactor-lintrunner-config-to-use-uv-run -> origin/codex/refactor-lintrunner-config-to-use-uv-run 2025-09-07T06:39:17.5310987Z * [new branch] codex/remove-allow-untyped-defs-and-fix-type-errors -> origin/codex/remove-allow-untyped-defs-and-fix-type-errors 2025-09-07T06:39:17.5311312Z * [new branch] compile_fsdp2_disable_stream_and_event -> origin/compile_fsdp2_disable_stream_and_event 2025-09-07T06:39:17.5311534Z * [new branch] context_test -> origin/context_test 2025-09-07T06:39:17.5311716Z * [new branch] copilot/fix-157446 -> origin/copilot/fix-157446 2025-09-07T06:39:17.5311892Z * [new branch] copy_graph -> origin/copy_graph 2025-09-07T06:39:17.5312073Z * [new branch] cpio/fix_new_ami_tests -> origin/cpio/fix_new_ami_tests 2025-09-07T06:39:17.5312274Z * [new branch] csl/always_produce_xml -> origin/csl/always_produce_xml 2025-09-07T06:39:17.5312474Z * [new branch] csl/build_test_more_procs -> origin/csl/build_test_more_procs 2025-09-07T06:39:17.5312675Z * [new branch] csl/build_test_more_procs2 -> origin/csl/build_test_more_procs2 2025-09-07T06:39:17.5312881Z * [new branch] csl/disable_flaky_cpp_test -> origin/csl/disable_flaky_cpp_test 2025-09-07T06:39:17.5313092Z * [new branch] csl/disable_periodic_test -> origin/csl/disable_periodic_test 2025-09-07T06:39:17.5313306Z * [new branch] csl/exclude_rocm_viable_strict -> origin/csl/exclude_rocm_viable_strict 2025-09-07T06:39:17.5313527Z * [new branch] csl/katex -> origin/csl/katex 2025-09-07T06:39:17.5313695Z * [new branch] csl/larger_runner -> origin/csl/larger_runner 2025-09-07T06:39:17.5313881Z * [new branch] csl/lintrunner_stuff -> origin/csl/lintrunner_stuff 2025-09-07T06:39:17.5314074Z * [new branch] csl/mps_sharding -> origin/csl/mps_sharding 2025-09-07T06:39:17.5314257Z * [new branch] csl/multistage_docker -> origin/csl/multistage_docker 2025-09-07T06:39:17.5314453Z * [new branch] csl/name_link_check_job -> origin/csl/name_link_check_job 2025-09-07T06:39:17.5314641Z * [new branch] csl/no_keep_goin_rocm -> origin/csl/no_keep_goin_rocm 2025-09-07T06:39:17.5314824Z * [new branch] csl/not_600_timeout -> origin/csl/not_600_timeout 2025-09-07T06:39:17.5315005Z * [new branch] csl/revert_open -> origin/csl/revert_open 2025-09-07T06:39:17.5315181Z * [new branch] csl/skip_build -> origin/csl/skip_build 2025-09-07T06:39:17.5315381Z * [new branch] csl/test_cuda_build_large_runner -> origin/csl/test_cuda_build_large_runner 2025-09-07T06:39:17.5315587Z * [new branch] csl/win_sccache -> origin/csl/win_sccache 2025-09-07T06:39:17.5315766Z * [new branch] cublasltrelax2 -> origin/cublasltrelax2 2025-09-07T06:39:17.5315940Z * [new branch] cublasrelax2 -> origin/cublasrelax2 2025-09-07T06:39:17.5316153Z * [new branch] cudnnsdparefactor -> origin/cudnnsdparefactor 2025-09-07T06:39:17.5316342Z * [new branch] custom_lowering_dict -> origin/custom_lowering_dict 2025-09-07T06:39:17.5316608Z * [new branch] czhuge_muon_dev -> origin/czhuge_muon_dev 2025-09-07T06:39:17.5316788Z * [new branch] d4l3k/delete_hook -> origin/d4l3k/delete_hook 2025-09-07T06:39:17.5316956Z * [new branch] dcp_zoc -> origin/dcp_zoc 2025-09-07T06:39:17.5317120Z * [new branch] debug-guard -> origin/debug-guard 2025-09-07T06:39:17.5317295Z * [new branch] delete-quant-docs -> origin/delete-quant-docs 2025-09-07T06:39:17.5317662Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.2 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.2 2025-09-07T06:39:17.5318178Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.3 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.3 2025-09-07T06:39:17.5318613Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.4 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.4 2025-09-07T06:39:17.5319162Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.56.0 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.56.0 2025-09-07T06:39:17.5319536Z * [new branch] dependabot/pip/dot-ci/docker/protobuf-5.29.5 -> origin/dependabot/pip/dot-ci/docker/protobuf-5.29.5 2025-09-07T06:39:17.5319882Z * [new branch] dependabot/pip/dot-github/requirements/protobuf-5.29.5 -> origin/dependabot/pip/dot-github/requirements/protobuf-5.29.5 2025-09-07T06:39:17.5320176Z * [new branch] desertfire/test_cpp_wrapper -> origin/desertfire/test_cpp_wrapper 2025-09-07T06:39:17.5320415Z * [new branch] desertfire/triton-cpu-for-aarch64 -> origin/desertfire/triton-cpu-for-aarch64 2025-09-07T06:39:17.5320645Z * [new branch] dev/joona/MPSNDArrayAdd -> origin/dev/joona/MPSNDArrayAdd 2025-09-07T06:39:17.5320843Z * [new branch] dev/joona/Unranked -> origin/dev/joona/Unranked 2025-09-07T06:39:17.5321043Z * [new branch] dev/joona/cat -> origin/dev/joona/cat 2025-09-07T06:39:17.5321231Z * [new branch] dev/joona/cat_remove_graph -> origin/dev/joona/cat_remove_graph 2025-09-07T06:39:17.5321435Z * [new branch] dev/joona/embeddingbag -> origin/dev/joona/embeddingbag 2025-09-07T06:39:17.5321649Z * [new branch] dev/joona/getTensorsString -> origin/dev/joona/getTensorsString 2025-09-07T06:39:17.5321895Z * [new branch] dev/joona/maxpool2dwithindices_errmsg -> origin/dev/joona/maxpool2dwithindices_errmsg 2025-09-07T06:39:17.5322141Z * [new branch] dev/joona/mps_linear_macos14 -> origin/dev/joona/mps_linear_macos14 2025-09-07T06:39:17.5322336Z * [new branch] dev/joona/sdpa -> origin/dev/joona/sdpa 2025-09-07T06:39:17.5322520Z * [new branch] dev/joona/topk_newapi -> origin/dev/joona/topk_newapi 2025-09-07T06:39:17.5322707Z * [new branch] dev/joona/type_inf -> origin/dev/joona/type_inf 2025-09-07T06:39:17.5322892Z * [new branch] dev/joona/upsize3d -> origin/dev/joona/upsize3d 2025-09-07T06:39:17.5323066Z * [new branch] disable -> origin/disable 2025-09-07T06:39:17.5323232Z * [new branch] e2e-baseline -> origin/e2e-baseline 2025-09-07T06:39:17.5323416Z * [new branch] eigen_for_sparse_addmm_v2 -> origin/eigen_for_sparse_addmm_v2 2025-09-07T06:39:17.5323618Z * [new branch] embg/test_inductor_ci_128B -> origin/embg/test_inductor_ci_128B 2025-09-07T06:39:17.5323879Z * [new branch] embg/test_inductor_ci_base -> origin/embg/test_inductor_ci_base 2025-09-07T06:39:17.5324090Z * [new branch] embg/test_inductor_ci_control -> origin/embg/test_inductor_ci_control 2025-09-07T06:39:17.5324306Z * [new branch] embg/triton_l2_prefetch_128B -> origin/embg/triton_l2_prefetch_128B 2025-09-07T06:39:17.5324513Z * [new branch] embg/triton_l2_prefetch_256B -> origin/embg/triton_l2_prefetch_256B 2025-09-07T06:39:17.5324704Z * [new branch] eqy-patch-1 -> origin/eqy-patch-1 2025-09-07T06:39:17.5327972Z * [new branch] eqy-patch-2 -> origin/eqy-patch-2 2025-09-07T06:39:17.5328242Z * [new branch] eqy-patch-3 -> origin/eqy-patch-3 2025-09-07T06:39:17.5328418Z * [new branch] eqy-patch-4 -> origin/eqy-patch-4 2025-09-07T06:39:17.5328616Z * [new branch] example-convert-torch.nn -> origin/example-convert-torch.nn 2025-09-07T06:39:17.5328874Z * [new branch] exclamaforte/add-contiguous-threshold -> origin/exclamaforte/add-contiguous-threshold 2025-09-07T06:39:17.5329114Z * [new branch] exclamaforte/amd-ma -> origin/exclamaforte/amd-ma 2025-09-07T06:39:17.5329352Z * [new branch] exclamaforte/bump-transformer-version -> origin/exclamaforte/bump-transformer-version 2025-09-07T06:39:17.5329626Z * [new branch] exclamaforte/clear-feedback-savers -> origin/exclamaforte/clear-feedback-savers 2025-09-07T06:39:17.5329889Z * [new branch] exclamaforte/combo-kernels-perf-run -> origin/exclamaforte/combo-kernels-perf-run 2025-09-07T06:39:17.5330139Z * [new branch] exclamaforte/do_bench_refactor -> origin/exclamaforte/do_bench_refactor 2025-09-07T06:39:17.5330382Z * [new branch] exclamaforte/enable-mem-dep-fusion -> origin/exclamaforte/enable-mem-dep-fusion 2025-09-07T06:39:17.5330657Z * [new branch] exclamaforte/fix-exhaustive-autotuning -> origin/exclamaforte/fix-exhaustive-autotuning 2025-09-07T06:39:17.5330969Z * [new branch] exclamaforte/fix-exhuastive-autotuning-reland -> origin/exclamaforte/fix-exhuastive-autotuning-reland 2025-09-07T06:39:17.5331278Z * [new branch] exclamaforte/fix-trace-parsing-fx-svg -> origin/exclamaforte/fix-trace-parsing-fx-svg 2025-09-07T06:39:17.5331570Z * [new branch] exclamaforte/force-pointwise-cat-perf-run -> origin/exclamaforte/force-pointwise-cat-perf-run 2025-09-07T06:39:17.5331835Z * [new branch] exclamaforte/fusion-data -> origin/exclamaforte/fusion-data 2025-09-07T06:39:17.5332064Z * [new branch] exclamaforte/gemm-benchmark-run -> origin/exclamaforte/gemm-benchmark-run 2025-09-07T06:39:17.5332305Z * [new branch] exclamaforte/gemm-export-model -> origin/exclamaforte/gemm-export-model 2025-09-07T06:39:17.5332527Z * [new branch] exclamaforte/gemm-model -> origin/exclamaforte/gemm-model 2025-09-07T06:39:17.5332788Z * [new branch] exclamaforte/gemm-model-all-data-collection -> origin/exclamaforte/gemm-model-all-data-collection 2025-09-07T06:39:17.5333048Z * [new branch] exclamaforte/gemm-to-amd -> origin/exclamaforte/gemm-to-amd 2025-09-07T06:39:17.5333263Z * [new branch] exclamaforte/just-gemm-model -> origin/exclamaforte/just-gemm-model 2025-09-07T06:39:17.5333524Z * [new branch] exclamaforte/just-gemm-model-no-refactor -> origin/exclamaforte/just-gemm-model-no-refactor 2025-09-07T06:39:17.5333787Z * [new branch] exclamaforte/max-autotune-ieee -> origin/exclamaforte/max-autotune-ieee 2025-09-07T06:39:17.5334016Z * [new branch] exclamaforte/memory-counter -> origin/exclamaforte/memory-counter 2025-09-07T06:39:17.5334243Z * [new branch] exclamaforte/profile-diff-algo -> origin/exclamaforte/profile-diff-algo 2025-09-07T06:39:17.5334534Z * [new branch] exclamaforte/profiler-combo -> origin/exclamaforte/profiler-combo 2025-09-07T06:39:17.5334767Z * [new branch] exclamaforte/test_cpp_wrapper_mode -> origin/exclamaforte/test_cpp_wrapper_mode 2025-09-07T06:39:17.5335032Z * [new branch] exclamaforte/update-autotune-configs -> origin/exclamaforte/update-autotune-configs 2025-09-07T06:39:17.5335313Z * [new branch] exclamaforte/update-autotune-configs-2 -> origin/exclamaforte/update-autotune-configs-2 2025-09-07T06:39:17.5335566Z * [new branch] exclamforte/gemm-model-final -> origin/exclamforte/gemm-model-final 2025-09-07T06:39:17.5335758Z * [new branch] exec -> origin/exec 2025-09-07T06:39:17.5335967Z * [new branch] executorch-module-shim -> origin/executorch-module-shim 2025-09-07T06:39:17.5336170Z * [new branch] experimental-mosaic -> origin/experimental-mosaic 2025-09-07T06:39:17.5336362Z * [new branch] export-D58091437 -> origin/export-D58091437 2025-09-07T06:39:17.5336638Z * [new branch] export-D61047529 -> origin/export-D61047529 2025-09-07T06:39:17.5336812Z * [new branch] export-D70112642 -> origin/export-D70112642 2025-09-07T06:39:17.5336984Z * [new branch] export-D71412006 -> origin/export-D71412006 2025-09-07T06:39:17.5337155Z * [new branch] export-D73042989 -> origin/export-D73042989 2025-09-07T06:39:17.5337326Z * [new branch] export-D75183591 -> origin/export-D75183591 2025-09-07T06:39:17.5337499Z * [new branch] export-D75617432 -> origin/export-D75617432 2025-09-07T06:39:17.5337675Z * [new branch] export-D75659965 -> origin/export-D75659965 2025-09-07T06:39:17.5337846Z * [new branch] export-D76080931 -> origin/export-D76080931 2025-09-07T06:39:17.5338018Z * [new branch] export-D76797250 -> origin/export-D76797250 2025-09-07T06:39:17.5338193Z * [new branch] export-D76885271 -> origin/export-D76885271 2025-09-07T06:39:17.5338362Z * [new branch] export-D76885620 -> origin/export-D76885620 2025-09-07T06:39:17.5341106Z * [new branch] export-D76936623 -> origin/export-D76936623 2025-09-07T06:39:17.5341345Z * [new branch] export-D76958268 -> origin/export-D76958268 2025-09-07T06:39:17.5341557Z * [new branch] export-D78375400 -> origin/export-D78375400 2025-09-07T06:39:17.5341773Z * [new branch] export-D78431305 -> origin/export-D78431305 2025-09-07T06:39:17.5341989Z * [new branch] export-D78580107 -> origin/export-D78580107 2025-09-07T06:39:17.5342199Z * [new branch] export-D78822171 -> origin/export-D78822171 2025-09-07T06:39:17.5342411Z * [new branch] export-D78822351 -> origin/export-D78822351 2025-09-07T06:39:17.5342617Z * [new branch] export-D78822507 -> origin/export-D78822507 2025-09-07T06:39:17.5342834Z * [new branch] export-D78826994 -> origin/export-D78826994 2025-09-07T06:39:17.5343045Z * [new branch] export-D78894324 -> origin/export-D78894324 2025-09-07T06:39:17.5343253Z * [new branch] export-D78929245 -> origin/export-D78929245 2025-09-07T06:39:17.5343489Z * [new branch] export-D78934925 -> origin/export-D78934925 2025-09-07T06:39:17.5343683Z * [new branch] export-D78953203 -> origin/export-D78953203 2025-09-07T06:39:17.5343898Z * [new branch] export-D78953229 -> origin/export-D78953229 2025-09-07T06:39:17.5344116Z * [new branch] export-D78957093 -> origin/export-D78957093 2025-09-07T06:39:17.5344311Z * [new branch] export-D78957389 -> origin/export-D78957389 2025-09-07T06:39:17.5344588Z * [new branch] export-D78996107 -> origin/export-D78996107 2025-09-07T06:39:17.5344791Z * [new branch] export-D79026433 -> origin/export-D79026433 2025-09-07T06:39:17.5346643Z * [new branch] export-D79230339 -> origin/export-D79230339 2025-09-07T06:39:17.5346840Z * [new branch] export-D79319835 -> origin/export-D79319835 2025-09-07T06:39:17.5347014Z * [new branch] export-D79328456 -> origin/export-D79328456 2025-09-07T06:39:17.5347189Z * [new branch] export-D79534608 -> origin/export-D79534608 2025-09-07T06:39:17.5347426Z * [new branch] export-D79785974 -> origin/export-D79785974 2025-09-07T06:39:17.5347596Z * [new branch] export-D80025417 -> origin/export-D80025417 2025-09-07T06:39:17.5347766Z * [new branch] export-D80120333 -> origin/export-D80120333 2025-09-07T06:39:17.5347945Z * [new branch] export-D80214882 -> origin/export-D80214882 2025-09-07T06:39:17.5348119Z * [new branch] export-D80319069 -> origin/export-D80319069 2025-09-07T06:39:17.5348288Z * [new branch] export-D80321215 -> origin/export-D80321215 2025-09-07T06:39:17.5348459Z * [new branch] export-D80503451 -> origin/export-D80503451 2025-09-07T06:39:17.5348631Z * [new branch] export-D80771648 -> origin/export-D80771648 2025-09-07T06:39:17.5350156Z * [new branch] export-D80823877 -> origin/export-D80823877 2025-09-07T06:39:17.5350347Z * [new branch] export-D80948073 -> origin/export-D80948073 2025-09-07T06:39:17.5350522Z * [new branch] export-D80958642 -> origin/export-D80958642 2025-09-07T06:39:17.5350697Z * [new branch] export-D80970483 -> origin/export-D80970483 2025-09-07T06:39:17.5350868Z * [new branch] export-D81054193 -> origin/export-D81054193 2025-09-07T06:39:17.5351039Z * [new branch] export-D81060182 -> origin/export-D81060182 2025-09-07T06:39:17.5351213Z * [new branch] export-D81078973 -> origin/export-D81078973 2025-09-07T06:39:17.5351385Z * [new branch] export-D81204584 -> origin/export-D81204584 2025-09-07T06:39:17.5351558Z * [new branch] export-D81284190 -> origin/export-D81284190 2025-09-07T06:39:17.5351729Z * [new branch] export-D81299840 -> origin/export-D81299840 2025-09-07T06:39:17.5351907Z * [new branch] export-D81429090 -> origin/export-D81429090 2025-09-07T06:39:17.5353307Z * [new branch] export-D81698719 -> origin/export-D81698719 2025-09-07T06:39:17.5353492Z * [new branch] export-D81747409 -> origin/export-D81747409 2025-09-07T06:39:17.5353712Z * [new branch] exported-model-train-idempotent -> origin/exported-model-train-idempotent 2025-09-07T06:39:17.5353957Z * [new branch] ezyang/wip-aot-descriptors -> origin/ezyang/wip-aot-descriptors 2025-09-07T06:39:17.5354154Z * [new branch] fa_u8_brgemm -> origin/fa_u8_brgemm 2025-09-07T06:39:17.5354329Z * [new branch] fastmath_baseline -> origin/fastmath_baseline 2025-09-07T06:39:17.5354506Z * [new branch] fbcode/warm -> origin/fbcode/warm 2025-09-07T06:39:17.5354669Z * [new branch] fca -> origin/fca 2025-09-07T06:39:17.5354829Z * [new branch] fca2_ca5984c -> origin/fca2_ca5984c 2025-09-07T06:39:17.5354988Z * [new branch] fca5 -> origin/fca5 2025-09-07T06:39:17.5355190Z * [new branch] feature/function-numa-binding -> origin/feature/function-numa-binding 2025-09-07T06:39:17.5355509Z * [new branch] feature/function-numa-binding-take2 -> origin/feature/function-numa-binding-take2 2025-09-07T06:39:17.5355752Z * [new branch] feature/numa-nproc-fix -> origin/feature/numa-nproc-fix 2025-09-07T06:39:17.5357303Z * [new branch] feature/numa-signpost-serialize -> origin/feature/numa-signpost-serialize 2025-09-07T06:39:17.5357566Z * [new branch] feature/parallel-numa-binding -> origin/feature/parallel-numa-binding 2025-09-07T06:39:17.5357811Z * [new branch] fengyuan/external-proj -> origin/fengyuan/external-proj 2025-09-07T06:39:17.5358131Z * [new branch] fengyuan/out-of-tree-xpu-ops-improve-test -> origin/fengyuan/out-of-tree-xpu-ops-improve-test 2025-09-07T06:39:17.5358495Z * [new branch] fengyuan/out-of-tree-xpu-ops-remove-dtype -> origin/fengyuan/out-of-tree-xpu-ops-remove-dtype 2025-09-07T06:39:17.5358748Z * [new branch] fengyuan/test-xpu -> origin/fengyuan/test-xpu 2025-09-07T06:39:17.5358938Z * [new branch] ffast_math_baseline -> origin/ffast_math_baseline 2025-09-07T06:39:17.5359118Z * [new branch] ffast_math_target -> origin/ffast_math_target 2025-09-07T06:39:17.5359301Z * [new branch] findhao/base_commit -> origin/findhao/base_commit 2025-09-07T06:39:17.5359492Z * [new branch] findhao/base_commit1 -> origin/findhao/base_commit1 2025-09-07T06:39:17.5359686Z * [new branch] findhao/multistream2 -> origin/findhao/multistream2 2025-09-07T06:39:17.5359877Z * [new branch] findhao/multistream5 -> origin/findhao/multistream5 2025-09-07T06:39:17.5360070Z * [new branch] findhao/multistream6 -> origin/findhao/multistream6 2025-09-07T06:39:17.5360271Z * [new branch] findhao/operatorbench3 -> origin/findhao/operatorbench3 2025-09-07T06:39:17.5360477Z * [new branch] findhao/operatorbench5 -> origin/findhao/operatorbench5 2025-09-07T06:39:17.5360673Z * [new branch] findhao/tritonparse -> origin/findhao/tritonparse 2025-09-07T06:39:17.5360848Z * [new branch] fix -> origin/fix 2025-09-07T06:39:17.5361040Z * [new branch] fix-ck-gemm-template-format -> origin/fix-ck-gemm-template-format 2025-09-07T06:39:17.5361245Z * [new branch] fix-config-ignore -> origin/fix-config-ignore 2025-09-07T06:39:17.5361451Z * [new branch] fix-dict-guard -> origin/fix-dict-guard 2025-09-07T06:39:17.5361706Z * [new branch] fix-inductor-periodic-0528 -> origin/fix-inductor-periodic-0528 2025-09-07T06:39:17.5361972Z * [new branch] fix-mps-benchmark -> origin/fix-mps-benchmark 2025-09-07T06:39:17.5363495Z * [new branch] fix-rlease-feature-template -> origin/fix-rlease-feature-template 2025-09-07T06:39:17.5363769Z * [new branch] fix-run-condition-upload-results -> origin/fix-run-condition-upload-results 2025-09-07T06:39:17.5363996Z * [new branch] fix-torchbench -> origin/fix-torchbench 2025-09-07T06:39:17.5364169Z * [new branch] fix_153389 -> origin/fix_153389 2025-09-07T06:39:17.5364342Z * [new branch] fix_fsdp_rs_bucket2 -> origin/fix_fsdp_rs_bucket2 2025-09-07T06:39:17.5364534Z * [new branch] fix_inductor_peridic_tests -> origin/fix_inductor_peridic_tests 2025-09-07T06:39:17.5364719Z * [new branch] fix_ubn_159469 -> origin/fix_ubn_159469 2025-09-07T06:39:17.5364886Z * [new branch] fixes-triage -> origin/fixes-triage 2025-09-07T06:39:17.5365059Z * [new branch] fixflashinfer -> origin/fixflashinfer 2025-09-07T06:39:17.5365236Z * [new branch] flash_decoding_cpu -> origin/flash_decoding_cpu 2025-09-07T06:39:17.5365464Z * [new branch] flex-flash -> origin/flex-flash 2025-09-07T06:39:17.5367417Z * [new branch] flex-lowering -> origin/flex-lowering 2025-09-07T06:39:17.5367595Z * [new branch] flex-warning -> origin/flex-warning 2025-09-07T06:39:17.5367797Z * [new branch] flex_attention_functorch_grad -> origin/flex_attention_functorch_grad 2025-09-07T06:39:17.5367990Z * [new branch] flex_flash -> origin/flex_flash 2025-09-07T06:39:17.5368177Z * [new branch] flexdecode-gqa-groups -> origin/flexdecode-gqa-groups 2025-09-07T06:39:17.5368395Z * [new branch] fmassa/fix_memeff_sharding_rule -> origin/fmassa/fix_memeff_sharding_rule 2025-09-07T06:39:17.5368659Z * [new branch] fsdp2_trace_rules -> origin/fsdp2_trace_rules 2025-09-07T06:39:17.5368831Z * [new branch] fsdpv2_3d -> origin/fsdpv2_3d 2025-09-07T06:39:17.5368998Z * [new branch] fsdpv2_3d_m1 -> origin/fsdpv2_3d_m1 2025-09-07T06:39:17.5369160Z * [new branch] fx_cpp -> origin/fx_cpp 2025-09-07T06:39:17.5369323Z * [new branch] fy/fix-win -> origin/fy/fix-win 2025-09-07T06:39:17.5369502Z * [new branch] gh/AlnisM/1/base -> origin/gh/AlnisM/1/base 2025-09-07T06:39:17.5369683Z * [new branch] gh/AlnisM/1/head -> origin/gh/AlnisM/1/head 2025-09-07T06:39:17.5369857Z * [new branch] gh/CaoE/2/base -> origin/gh/CaoE/2/base 2025-09-07T06:39:17.5370026Z * [new branch] gh/CaoE/2/head -> origin/gh/CaoE/2/head 2025-09-07T06:39:17.5370195Z * [new branch] gh/CaoE/2/orig -> origin/gh/CaoE/2/orig 2025-09-07T06:39:17.5370386Z * [new branch] gh/ColinPeppler/79/base -> origin/gh/ColinPeppler/79/base 2025-09-07T06:39:17.5370595Z * [new branch] gh/ColinPeppler/79/head -> origin/gh/ColinPeppler/79/head 2025-09-07T06:39:17.5370797Z * [new branch] gh/ColinPeppler/79/orig -> origin/gh/ColinPeppler/79/orig 2025-09-07T06:39:17.5370994Z * [new branch] gh/ColinPeppler/80/base -> origin/gh/ColinPeppler/80/base 2025-09-07T06:39:17.5371191Z * [new branch] gh/ColinPeppler/80/head -> origin/gh/ColinPeppler/80/head 2025-09-07T06:39:17.5371388Z * [new branch] gh/ColinPeppler/80/orig -> origin/gh/ColinPeppler/80/orig 2025-09-07T06:39:17.5373179Z * [new branch] gh/EikanWang/67/base -> origin/gh/EikanWang/67/base 2025-09-07T06:39:17.5373388Z * [new branch] gh/EikanWang/67/head -> origin/gh/EikanWang/67/head 2025-09-07T06:39:17.5373585Z * [new branch] gh/EikanWang/80/base -> origin/gh/EikanWang/80/base 2025-09-07T06:39:17.5373773Z * [new branch] gh/EikanWang/80/head -> origin/gh/EikanWang/80/head 2025-09-07T06:39:17.5373960Z * [new branch] gh/EikanWang/80/orig -> origin/gh/EikanWang/80/orig 2025-09-07T06:39:17.5374145Z * [new branch] gh/EikanWang/81/base -> origin/gh/EikanWang/81/base 2025-09-07T06:39:17.5374328Z * [new branch] gh/EikanWang/81/head -> origin/gh/EikanWang/81/head 2025-09-07T06:39:17.5374511Z * [new branch] gh/EikanWang/81/orig -> origin/gh/EikanWang/81/orig 2025-09-07T06:39:17.5374693Z * [new branch] gh/EikanWang/82/base -> origin/gh/EikanWang/82/base 2025-09-07T06:39:17.5374879Z * [new branch] gh/EikanWang/82/head -> origin/gh/EikanWang/82/head 2025-09-07T06:39:17.5375064Z * [new branch] gh/EikanWang/82/orig -> origin/gh/EikanWang/82/orig 2025-09-07T06:39:17.5375251Z * [new branch] gh/Gasoonjia/1/base -> origin/gh/Gasoonjia/1/base 2025-09-07T06:39:17.5376671Z * [new branch] gh/Gasoonjia/1/head -> origin/gh/Gasoonjia/1/head 2025-09-07T06:39:17.5376936Z * [new branch] gh/H-Huang/131/base -> origin/gh/H-Huang/131/base 2025-09-07T06:39:17.5377115Z * [new branch] gh/H-Huang/131/head -> origin/gh/H-Huang/131/head 2025-09-07T06:39:17.5377291Z * [new branch] gh/H-Huang/131/orig -> origin/gh/H-Huang/131/orig 2025-09-07T06:39:17.5377467Z * [new branch] gh/H-Huang/132/base -> origin/gh/H-Huang/132/base 2025-09-07T06:39:17.5377644Z * [new branch] gh/H-Huang/132/head -> origin/gh/H-Huang/132/head 2025-09-07T06:39:17.5377819Z * [new branch] gh/H-Huang/132/orig -> origin/gh/H-Huang/132/orig 2025-09-07T06:39:17.5378050Z * [new branch] gh/H-Huang/180/base -> origin/gh/H-Huang/180/base 2025-09-07T06:39:17.5378227Z * [new branch] gh/H-Huang/180/head -> origin/gh/H-Huang/180/head 2025-09-07T06:39:17.5378402Z * [new branch] gh/H-Huang/180/orig -> origin/gh/H-Huang/180/orig 2025-09-07T06:39:17.5378581Z * [new branch] gh/H-Huang/182/base -> origin/gh/H-Huang/182/base 2025-09-07T06:39:17.5378754Z * [new branch] gh/H-Huang/182/head -> origin/gh/H-Huang/182/head 2025-09-07T06:39:17.5378931Z * [new branch] gh/H-Huang/182/orig -> origin/gh/H-Huang/182/orig 2025-09-07T06:39:17.5379108Z * [new branch] gh/H-Huang/187/base -> origin/gh/H-Huang/187/base 2025-09-07T06:39:17.5379283Z * [new branch] gh/H-Huang/187/head -> origin/gh/H-Huang/187/head 2025-09-07T06:39:17.5379458Z * [new branch] gh/H-Huang/187/orig -> origin/gh/H-Huang/187/orig 2025-09-07T06:39:17.5379634Z * [new branch] gh/H-Huang/202/base -> origin/gh/H-Huang/202/base 2025-09-07T06:39:17.5379810Z * [new branch] gh/H-Huang/202/head -> origin/gh/H-Huang/202/head 2025-09-07T06:39:17.5381056Z * [new branch] gh/H-Huang/202/orig -> origin/gh/H-Huang/202/orig 2025-09-07T06:39:17.5381246Z * [new branch] gh/H-Huang/203/base -> origin/gh/H-Huang/203/base 2025-09-07T06:39:17.5381424Z * [new branch] gh/H-Huang/203/head -> origin/gh/H-Huang/203/head 2025-09-07T06:39:17.5381602Z * [new branch] gh/H-Huang/203/orig -> origin/gh/H-Huang/203/orig 2025-09-07T06:39:17.5381778Z * [new branch] gh/H-Huang/204/base -> origin/gh/H-Huang/204/base 2025-09-07T06:39:17.5381956Z * [new branch] gh/H-Huang/204/head -> origin/gh/H-Huang/204/head 2025-09-07T06:39:17.5382132Z * [new branch] gh/H-Huang/204/orig -> origin/gh/H-Huang/204/orig 2025-09-07T06:39:17.5382314Z * [new branch] gh/H-Huang/205/base -> origin/gh/H-Huang/205/base 2025-09-07T06:39:17.5382495Z * [new branch] gh/H-Huang/205/head -> origin/gh/H-Huang/205/head 2025-09-07T06:39:17.5382677Z * [new branch] gh/H-Huang/205/orig -> origin/gh/H-Huang/205/orig 2025-09-07T06:39:17.5382855Z * [new branch] gh/H-Huang/206/base -> origin/gh/H-Huang/206/base 2025-09-07T06:39:17.5383035Z * [new branch] gh/H-Huang/206/head -> origin/gh/H-Huang/206/head 2025-09-07T06:39:17.5384382Z * [new branch] gh/H-Huang/206/orig -> origin/gh/H-Huang/206/orig 2025-09-07T06:39:17.5384568Z * [new branch] gh/H-Huang/207/base -> origin/gh/H-Huang/207/base 2025-09-07T06:39:17.5384745Z * [new branch] gh/H-Huang/207/head -> origin/gh/H-Huang/207/head 2025-09-07T06:39:17.5384921Z * [new branch] gh/H-Huang/207/orig -> origin/gh/H-Huang/207/orig 2025-09-07T06:39:17.5385100Z * [new branch] gh/H-Huang/208/base -> origin/gh/H-Huang/208/base 2025-09-07T06:39:17.5385277Z * [new branch] gh/H-Huang/208/head -> origin/gh/H-Huang/208/head 2025-09-07T06:39:17.5385495Z * [new branch] gh/H-Huang/208/orig -> origin/gh/H-Huang/208/orig 2025-09-07T06:39:17.5385672Z * [new branch] gh/H-Huang/209/base -> origin/gh/H-Huang/209/base 2025-09-07T06:39:17.5385849Z * [new branch] gh/H-Huang/209/head -> origin/gh/H-Huang/209/head 2025-09-07T06:39:17.5386026Z * [new branch] gh/H-Huang/209/orig -> origin/gh/H-Huang/209/orig 2025-09-07T06:39:17.5386205Z * [new branch] gh/H-Huang/210/base -> origin/gh/H-Huang/210/base 2025-09-07T06:39:17.5386382Z * [new branch] gh/H-Huang/210/head -> origin/gh/H-Huang/210/head 2025-09-07T06:39:17.5387863Z * [new branch] gh/H-Huang/210/orig -> origin/gh/H-Huang/210/orig 2025-09-07T06:39:17.5388049Z * [new branch] gh/H-Huang/211/base -> origin/gh/H-Huang/211/base 2025-09-07T06:39:17.5388225Z * [new branch] gh/H-Huang/211/head -> origin/gh/H-Huang/211/head 2025-09-07T06:39:17.5388408Z * [new branch] gh/H-Huang/211/orig -> origin/gh/H-Huang/211/orig 2025-09-07T06:39:17.5388585Z * [new branch] gh/H-Huang/212/base -> origin/gh/H-Huang/212/base 2025-09-07T06:39:17.5388761Z * [new branch] gh/H-Huang/212/head -> origin/gh/H-Huang/212/head 2025-09-07T06:39:17.5388939Z * [new branch] gh/H-Huang/212/orig -> origin/gh/H-Huang/212/orig 2025-09-07T06:39:17.5389116Z * [new branch] gh/H-Huang/213/base -> origin/gh/H-Huang/213/base 2025-09-07T06:39:17.5389292Z * [new branch] gh/H-Huang/213/head -> origin/gh/H-Huang/213/head 2025-09-07T06:39:17.5389472Z * [new branch] gh/H-Huang/213/orig -> origin/gh/H-Huang/213/orig 2025-09-07T06:39:17.5389651Z * [new branch] gh/H-Huang/214/base -> origin/gh/H-Huang/214/base 2025-09-07T06:39:17.5389829Z * [new branch] gh/H-Huang/214/head -> origin/gh/H-Huang/214/head 2025-09-07T06:39:17.5390009Z * [new branch] gh/H-Huang/214/orig -> origin/gh/H-Huang/214/orig 2025-09-07T06:39:17.5390204Z * [new branch] gh/IvanKobzarev/112/base -> origin/gh/IvanKobzarev/112/base 2025-09-07T06:39:17.5390412Z * [new branch] gh/IvanKobzarev/112/head -> origin/gh/IvanKobzarev/112/head 2025-09-07T06:39:17.5390615Z * [new branch] gh/IvanKobzarev/112/orig -> origin/gh/IvanKobzarev/112/orig 2025-09-07T06:39:17.5390815Z * [new branch] gh/IvanKobzarev/115/base -> origin/gh/IvanKobzarev/115/base 2025-09-07T06:39:17.5392276Z * [new branch] gh/IvanKobzarev/115/head -> origin/gh/IvanKobzarev/115/head 2025-09-07T06:39:17.5392498Z * [new branch] gh/IvanKobzarev/115/orig -> origin/gh/IvanKobzarev/115/orig 2025-09-07T06:39:17.5392699Z * [new branch] gh/IvanKobzarev/116/base -> origin/gh/IvanKobzarev/116/base 2025-09-07T06:39:17.5392900Z * [new branch] gh/IvanKobzarev/116/head -> origin/gh/IvanKobzarev/116/head 2025-09-07T06:39:17.5393098Z * [new branch] gh/IvanKobzarev/116/orig -> origin/gh/IvanKobzarev/116/orig 2025-09-07T06:39:17.5393296Z * [new branch] gh/IvanKobzarev/118/base -> origin/gh/IvanKobzarev/118/base 2025-09-07T06:39:17.5393501Z * [new branch] gh/IvanKobzarev/118/head -> origin/gh/IvanKobzarev/118/head 2025-09-07T06:39:17.5393700Z * [new branch] gh/IvanKobzarev/118/orig -> origin/gh/IvanKobzarev/118/orig 2025-09-07T06:39:17.5393899Z * [new branch] gh/IvanKobzarev/126/base -> origin/gh/IvanKobzarev/126/base 2025-09-07T06:39:17.5394103Z * [new branch] gh/IvanKobzarev/126/head -> origin/gh/IvanKobzarev/126/head 2025-09-07T06:39:17.5394302Z * [new branch] gh/IvanKobzarev/126/orig -> origin/gh/IvanKobzarev/126/orig 2025-09-07T06:39:17.5395788Z * [new branch] gh/IvanKobzarev/127/base -> origin/gh/IvanKobzarev/127/base 2025-09-07T06:39:17.5396062Z * [new branch] gh/IvanKobzarev/127/head -> origin/gh/IvanKobzarev/127/head 2025-09-07T06:39:17.5396263Z * [new branch] gh/IvanKobzarev/127/orig -> origin/gh/IvanKobzarev/127/orig 2025-09-07T06:39:17.5396462Z * [new branch] gh/IvanKobzarev/128/base -> origin/gh/IvanKobzarev/128/base 2025-09-07T06:39:17.5396749Z * [new branch] gh/IvanKobzarev/128/head -> origin/gh/IvanKobzarev/128/head 2025-09-07T06:39:17.5396946Z * [new branch] gh/IvanKobzarev/128/orig -> origin/gh/IvanKobzarev/128/orig 2025-09-07T06:39:17.5397143Z * [new branch] gh/IvanKobzarev/132/base -> origin/gh/IvanKobzarev/132/base 2025-09-07T06:39:17.5397387Z * [new branch] gh/IvanKobzarev/132/head -> origin/gh/IvanKobzarev/132/head 2025-09-07T06:39:17.5397586Z * [new branch] gh/IvanKobzarev/132/orig -> origin/gh/IvanKobzarev/132/orig 2025-09-07T06:39:17.5397792Z * [new branch] gh/IvanKobzarev/133/base -> origin/gh/IvanKobzarev/133/base 2025-09-07T06:39:17.5398096Z * [new branch] gh/IvanKobzarev/133/head -> origin/gh/IvanKobzarev/133/head 2025-09-07T06:39:17.5398298Z * [new branch] gh/IvanKobzarev/133/orig -> origin/gh/IvanKobzarev/133/orig 2025-09-07T06:39:17.5398498Z * [new branch] gh/IvanKobzarev/134/base -> origin/gh/IvanKobzarev/134/base 2025-09-07T06:39:17.5398696Z * [new branch] gh/IvanKobzarev/134/head -> origin/gh/IvanKobzarev/134/head 2025-09-07T06:39:17.5398895Z * [new branch] gh/IvanKobzarev/134/orig -> origin/gh/IvanKobzarev/134/orig 2025-09-07T06:39:17.5399097Z * [new branch] gh/IvanKobzarev/135/base -> origin/gh/IvanKobzarev/135/base 2025-09-07T06:39:17.5399294Z * [new branch] gh/IvanKobzarev/135/head -> origin/gh/IvanKobzarev/135/head 2025-09-07T06:39:17.5399491Z * [new branch] gh/IvanKobzarev/135/orig -> origin/gh/IvanKobzarev/135/orig 2025-09-07T06:39:17.5399696Z * [new branch] gh/IvanKobzarev/136/base -> origin/gh/IvanKobzarev/136/base 2025-09-07T06:39:17.5399897Z * [new branch] gh/IvanKobzarev/136/head -> origin/gh/IvanKobzarev/136/head 2025-09-07T06:39:17.5400098Z * [new branch] gh/IvanKobzarev/136/orig -> origin/gh/IvanKobzarev/136/orig 2025-09-07T06:39:17.5400296Z * [new branch] gh/IvanKobzarev/137/base -> origin/gh/IvanKobzarev/137/base 2025-09-07T06:39:17.5400514Z * [new branch] gh/IvanKobzarev/137/head -> origin/gh/IvanKobzarev/137/head 2025-09-07T06:39:17.5402079Z * [new branch] gh/IvanKobzarev/137/orig -> origin/gh/IvanKobzarev/137/orig 2025-09-07T06:39:17.5402287Z * [new branch] gh/IvanKobzarev/138/base -> origin/gh/IvanKobzarev/138/base 2025-09-07T06:39:17.5402485Z * [new branch] gh/IvanKobzarev/138/head -> origin/gh/IvanKobzarev/138/head 2025-09-07T06:39:17.5402685Z * [new branch] gh/IvanKobzarev/138/orig -> origin/gh/IvanKobzarev/138/orig 2025-09-07T06:39:17.5402882Z * [new branch] gh/IvanKobzarev/139/base -> origin/gh/IvanKobzarev/139/base 2025-09-07T06:39:17.5403077Z * [new branch] gh/IvanKobzarev/139/head -> origin/gh/IvanKobzarev/139/head 2025-09-07T06:39:17.5403279Z * [new branch] gh/IvanKobzarev/139/orig -> origin/gh/IvanKobzarev/139/orig 2025-09-07T06:39:17.5403476Z * [new branch] gh/IvanKobzarev/140/base -> origin/gh/IvanKobzarev/140/base 2025-09-07T06:39:17.5403674Z * [new branch] gh/IvanKobzarev/140/head -> origin/gh/IvanKobzarev/140/head 2025-09-07T06:39:17.5403875Z * [new branch] gh/IvanKobzarev/140/orig -> origin/gh/IvanKobzarev/140/orig 2025-09-07T06:39:17.5404072Z * [new branch] gh/IvanKobzarev/141/base -> origin/gh/IvanKobzarev/141/base 2025-09-07T06:39:17.5405915Z * [new branch] gh/IvanKobzarev/141/head -> origin/gh/IvanKobzarev/141/head 2025-09-07T06:39:17.5406130Z * [new branch] gh/IvanKobzarev/141/orig -> origin/gh/IvanKobzarev/141/orig 2025-09-07T06:39:17.5406334Z * [new branch] gh/IvanKobzarev/142/base -> origin/gh/IvanKobzarev/142/base 2025-09-07T06:39:17.5406627Z * [new branch] gh/IvanKobzarev/142/head -> origin/gh/IvanKobzarev/142/head 2025-09-07T06:39:17.5406837Z * [new branch] gh/IvanKobzarev/142/orig -> origin/gh/IvanKobzarev/142/orig 2025-09-07T06:39:17.5407034Z * [new branch] gh/IvanKobzarev/143/base -> origin/gh/IvanKobzarev/143/base 2025-09-07T06:39:17.5407241Z * [new branch] gh/IvanKobzarev/143/head -> origin/gh/IvanKobzarev/143/head 2025-09-07T06:39:17.5407500Z * [new branch] gh/IvanKobzarev/143/orig -> origin/gh/IvanKobzarev/143/orig 2025-09-07T06:39:17.5407701Z * [new branch] gh/IvanKobzarev/144/base -> origin/gh/IvanKobzarev/144/base 2025-09-07T06:39:17.5407904Z * [new branch] gh/IvanKobzarev/144/head -> origin/gh/IvanKobzarev/144/head 2025-09-07T06:39:17.5408104Z * [new branch] gh/IvanKobzarev/144/orig -> origin/gh/IvanKobzarev/144/orig 2025-09-07T06:39:17.5408312Z * [new branch] gh/IvanKobzarev/145/base -> origin/gh/IvanKobzarev/145/base 2025-09-07T06:39:17.5408514Z * [new branch] gh/IvanKobzarev/145/head -> origin/gh/IvanKobzarev/145/head 2025-09-07T06:39:17.5408714Z * [new branch] gh/IvanKobzarev/145/orig -> origin/gh/IvanKobzarev/145/orig 2025-09-07T06:39:17.5408911Z * [new branch] gh/IvanKobzarev/146/base -> origin/gh/IvanKobzarev/146/base 2025-09-07T06:39:17.5409119Z * [new branch] gh/IvanKobzarev/146/head -> origin/gh/IvanKobzarev/146/head 2025-09-07T06:39:17.5409318Z * [new branch] gh/IvanKobzarev/146/orig -> origin/gh/IvanKobzarev/146/orig 2025-09-07T06:39:17.5409529Z * [new branch] gh/NikhilAPatel/1/base -> origin/gh/NikhilAPatel/1/base 2025-09-07T06:39:17.5409730Z * [new branch] gh/NikhilAPatel/1/head -> origin/gh/NikhilAPatel/1/head 2025-09-07T06:39:17.5409923Z * [new branch] gh/NikhilAPatel/2/base -> origin/gh/NikhilAPatel/2/base 2025-09-07T06:39:17.5410119Z * [new branch] gh/NikhilAPatel/2/head -> origin/gh/NikhilAPatel/2/head 2025-09-07T06:39:17.5410314Z * [new branch] gh/NikhilAPatel/4/base -> origin/gh/NikhilAPatel/4/base 2025-09-07T06:39:17.5410534Z * [new branch] gh/NikhilAPatel/4/head -> origin/gh/NikhilAPatel/4/head 2025-09-07T06:39:17.5410803Z * [new branch] gh/PaliC/1/base -> origin/gh/PaliC/1/base 2025-09-07T06:39:17.5412379Z * [new branch] gh/PaliC/1/head -> origin/gh/PaliC/1/head 2025-09-07T06:39:17.5412562Z * [new branch] gh/PaliC/1/orig -> origin/gh/PaliC/1/orig 2025-09-07T06:39:17.5412746Z * [new branch] gh/PaliC/17/base -> origin/gh/PaliC/17/base 2025-09-07T06:39:17.5412925Z * [new branch] gh/PaliC/17/head -> origin/gh/PaliC/17/head 2025-09-07T06:39:17.5413102Z * [new branch] gh/PaliC/17/orig -> origin/gh/PaliC/17/orig 2025-09-07T06:39:17.5413278Z * [new branch] gh/PaliC/18/base -> origin/gh/PaliC/18/base 2025-09-07T06:39:17.5413453Z * [new branch] gh/PaliC/18/head -> origin/gh/PaliC/18/head 2025-09-07T06:39:17.5413625Z * [new branch] gh/PaliC/18/orig -> origin/gh/PaliC/18/orig 2025-09-07T06:39:17.5413801Z * [new branch] gh/PaliC/2/base -> origin/gh/PaliC/2/base 2025-09-07T06:39:17.5413978Z * [new branch] gh/PaliC/2/head -> origin/gh/PaliC/2/head 2025-09-07T06:39:17.5414149Z * [new branch] gh/PaliC/2/orig -> origin/gh/PaliC/2/orig 2025-09-07T06:39:17.5415639Z * [new branch] gh/PaliC/20/base -> origin/gh/PaliC/20/base 2025-09-07T06:39:17.5415827Z * [new branch] gh/PaliC/20/head -> origin/gh/PaliC/20/head 2025-09-07T06:39:17.5415999Z * [new branch] gh/PaliC/20/orig -> origin/gh/PaliC/20/orig 2025-09-07T06:39:17.5416172Z * [new branch] gh/PaliC/21/base -> origin/gh/PaliC/21/base 2025-09-07T06:39:17.5416342Z * [new branch] gh/PaliC/21/head -> origin/gh/PaliC/21/head 2025-09-07T06:39:17.5416622Z * [new branch] gh/PaliC/21/orig -> origin/gh/PaliC/21/orig 2025-09-07T06:39:17.5416797Z * [new branch] gh/PaliC/22/base -> origin/gh/PaliC/22/base 2025-09-07T06:39:17.5417007Z * [new branch] gh/PaliC/22/head -> origin/gh/PaliC/22/head 2025-09-07T06:39:17.5417184Z * [new branch] gh/PaliC/22/orig -> origin/gh/PaliC/22/orig 2025-09-07T06:39:17.5417362Z * [new branch] gh/PaliC/23/base -> origin/gh/PaliC/23/base 2025-09-07T06:39:17.5417535Z * [new branch] gh/PaliC/23/head -> origin/gh/PaliC/23/head 2025-09-07T06:39:17.5417707Z * [new branch] gh/PaliC/23/orig -> origin/gh/PaliC/23/orig 2025-09-07T06:39:17.5419128Z * [new branch] gh/PaliC/24/base -> origin/gh/PaliC/24/base 2025-09-07T06:39:17.5419309Z * [new branch] gh/PaliC/24/head -> origin/gh/PaliC/24/head 2025-09-07T06:39:17.5419484Z * [new branch] gh/PaliC/24/orig -> origin/gh/PaliC/24/orig 2025-09-07T06:39:17.5419677Z * [new branch] gh/PaulZhang12/17/base -> origin/gh/PaulZhang12/17/base 2025-09-07T06:39:17.5419876Z * [new branch] gh/PaulZhang12/17/head -> origin/gh/PaulZhang12/17/head 2025-09-07T06:39:17.5420071Z * [new branch] gh/PaulZhang12/20/base -> origin/gh/PaulZhang12/20/base 2025-09-07T06:39:17.5420268Z * [new branch] gh/PaulZhang12/20/head -> origin/gh/PaulZhang12/20/head 2025-09-07T06:39:17.5420462Z * [new branch] gh/PaulZhang12/20/orig -> origin/gh/PaulZhang12/20/orig 2025-09-07T06:39:17.5420655Z * [new branch] gh/PaulZhang12/21/base -> origin/gh/PaulZhang12/21/base 2025-09-07T06:39:17.5420848Z * [new branch] gh/PaulZhang12/21/head -> origin/gh/PaulZhang12/21/head 2025-09-07T06:39:17.5421039Z * [new branch] gh/PaulZhang12/21/orig -> origin/gh/PaulZhang12/21/orig 2025-09-07T06:39:17.5421231Z * [new branch] gh/PaulZhang12/22/base -> origin/gh/PaulZhang12/22/base 2025-09-07T06:39:17.5421422Z * [new branch] gh/PaulZhang12/22/head -> origin/gh/PaulZhang12/22/head 2025-09-07T06:39:17.5421621Z * [new branch] gh/PaulZhang12/22/orig -> origin/gh/PaulZhang12/22/orig 2025-09-07T06:39:17.5421812Z * [new branch] gh/PaulZhang12/23/base -> origin/gh/PaulZhang12/23/base 2025-09-07T06:39:17.5423232Z * [new branch] gh/PaulZhang12/23/head -> origin/gh/PaulZhang12/23/head 2025-09-07T06:39:17.5423431Z * [new branch] gh/PaulZhang12/23/orig -> origin/gh/PaulZhang12/23/orig 2025-09-07T06:39:17.5423623Z * [new branch] gh/PaulZhang12/24/base -> origin/gh/PaulZhang12/24/base 2025-09-07T06:39:17.5423812Z * [new branch] gh/PaulZhang12/24/head -> origin/gh/PaulZhang12/24/head 2025-09-07T06:39:17.5424005Z * [new branch] gh/PaulZhang12/24/orig -> origin/gh/PaulZhang12/24/orig 2025-09-07T06:39:17.5424200Z * [new branch] gh/PaulZhang12/25/base -> origin/gh/PaulZhang12/25/base 2025-09-07T06:39:17.5424401Z * [new branch] gh/PaulZhang12/25/head -> origin/gh/PaulZhang12/25/head 2025-09-07T06:39:17.5424590Z * [new branch] gh/PaulZhang12/25/orig -> origin/gh/PaulZhang12/25/orig 2025-09-07T06:39:17.5424866Z * [new branch] gh/SamGinzburg/11/base -> origin/gh/SamGinzburg/11/base 2025-09-07T06:39:17.5425059Z * [new branch] gh/SamGinzburg/11/head -> origin/gh/SamGinzburg/11/head 2025-09-07T06:39:17.5425261Z * [new branch] gh/Sidharth123-cpu/24/base -> origin/gh/Sidharth123-cpu/24/base 2025-09-07T06:39:17.5426668Z * [new branch] gh/Sidharth123-cpu/25/base -> origin/gh/Sidharth123-cpu/25/base 2025-09-07T06:39:17.5426893Z * [new branch] gh/Sidharth123-cpu/26/base -> origin/gh/Sidharth123-cpu/26/base 2025-09-07T06:39:17.5427100Z * [new branch] gh/Sidharth123-cpu/27/base -> origin/gh/Sidharth123-cpu/27/base 2025-09-07T06:39:17.5427303Z * [new branch] gh/StrongerXi/1/base -> origin/gh/StrongerXi/1/base 2025-09-07T06:39:17.5427554Z * [new branch] gh/StrongerXi/1/head -> origin/gh/StrongerXi/1/head 2025-09-07T06:39:17.5427750Z * [new branch] gh/StrongerXi/133/base -> origin/gh/StrongerXi/133/base 2025-09-07T06:39:17.5427948Z * [new branch] gh/StrongerXi/133/head -> origin/gh/StrongerXi/133/head 2025-09-07T06:39:17.5428137Z * [new branch] gh/StrongerXi/133/orig -> origin/gh/StrongerXi/133/orig 2025-09-07T06:39:17.5428328Z * [new branch] gh/StrongerXi/134/base -> origin/gh/StrongerXi/134/base 2025-09-07T06:39:17.5428521Z * [new branch] gh/StrongerXi/134/head -> origin/gh/StrongerXi/134/head 2025-09-07T06:39:17.5428719Z * [new branch] gh/StrongerXi/134/orig -> origin/gh/StrongerXi/134/orig 2025-09-07T06:39:17.5428909Z * [new branch] gh/StrongerXi/136/base -> origin/gh/StrongerXi/136/base 2025-09-07T06:39:17.5429103Z * [new branch] gh/StrongerXi/136/head -> origin/gh/StrongerXi/136/head 2025-09-07T06:39:17.5429292Z * [new branch] gh/StrongerXi/136/orig -> origin/gh/StrongerXi/136/orig 2025-09-07T06:39:17.5429482Z * [new branch] gh/StrongerXi/137/base -> origin/gh/StrongerXi/137/base 2025-09-07T06:39:17.5429676Z * [new branch] gh/StrongerXi/137/head -> origin/gh/StrongerXi/137/head 2025-09-07T06:39:17.5429870Z * [new branch] gh/StrongerXi/137/orig -> origin/gh/StrongerXi/137/orig 2025-09-07T06:39:17.5430059Z * [new branch] gh/StrongerXi/138/base -> origin/gh/StrongerXi/138/base 2025-09-07T06:39:17.5430249Z * [new branch] gh/StrongerXi/138/head -> origin/gh/StrongerXi/138/head 2025-09-07T06:39:17.5431661Z * [new branch] gh/StrongerXi/138/orig -> origin/gh/StrongerXi/138/orig 2025-09-07T06:39:17.5431863Z * [new branch] gh/StrongerXi/139/base -> origin/gh/StrongerXi/139/base 2025-09-07T06:39:17.5432058Z * [new branch] gh/StrongerXi/139/head -> origin/gh/StrongerXi/139/head 2025-09-07T06:39:17.5432247Z * [new branch] gh/StrongerXi/139/orig -> origin/gh/StrongerXi/139/orig 2025-09-07T06:39:17.5432442Z * [new branch] gh/StrongerXi/140/base -> origin/gh/StrongerXi/140/base 2025-09-07T06:39:17.5432635Z * [new branch] gh/StrongerXi/140/head -> origin/gh/StrongerXi/140/head 2025-09-07T06:39:17.5432823Z * [new branch] gh/StrongerXi/140/orig -> origin/gh/StrongerXi/140/orig 2025-09-07T06:39:17.5433019Z * [new branch] gh/StrongerXi/71/base -> origin/gh/StrongerXi/71/base 2025-09-07T06:39:17.5433213Z * [new branch] gh/StrongerXi/71/head -> origin/gh/StrongerXi/71/head 2025-09-07T06:39:17.5433405Z * [new branch] gh/StrongerXi/72/base -> origin/gh/StrongerXi/72/base 2025-09-07T06:39:17.5433594Z * [new branch] gh/StrongerXi/72/head -> origin/gh/StrongerXi/72/head 2025-09-07T06:39:17.5433785Z * [new branch] gh/XilunWu/133/base -> origin/gh/XilunWu/133/base 2025-09-07T06:39:17.5435180Z * [new branch] gh/XilunWu/133/head -> origin/gh/XilunWu/133/head 2025-09-07T06:39:17.5435433Z * [new branch] gh/XilunWu/133/orig -> origin/gh/XilunWu/133/orig 2025-09-07T06:39:17.5435617Z * [new branch] gh/XilunWu/139/base -> origin/gh/XilunWu/139/base 2025-09-07T06:39:17.5435801Z * [new branch] gh/XilunWu/139/head -> origin/gh/XilunWu/139/head 2025-09-07T06:39:17.5435987Z * [new branch] gh/XilunWu/139/orig -> origin/gh/XilunWu/139/orig 2025-09-07T06:39:17.5436167Z * [new branch] gh/XilunWu/143/base -> origin/gh/XilunWu/143/base 2025-09-07T06:39:17.5436348Z * [new branch] gh/XilunWu/143/head -> origin/gh/XilunWu/143/head 2025-09-07T06:39:17.5436649Z * [new branch] gh/XilunWu/143/orig -> origin/gh/XilunWu/143/orig 2025-09-07T06:39:17.5436833Z * [new branch] gh/XilunWu/144/base -> origin/gh/XilunWu/144/base 2025-09-07T06:39:17.5437014Z * [new branch] gh/XilunWu/144/head -> origin/gh/XilunWu/144/head 2025-09-07T06:39:17.5437199Z * [new branch] gh/XilunWu/144/orig -> origin/gh/XilunWu/144/orig 2025-09-07T06:39:17.5437385Z * [new branch] gh/XilunWu/145/base -> origin/gh/XilunWu/145/base 2025-09-07T06:39:17.5437564Z * [new branch] gh/XilunWu/145/head -> origin/gh/XilunWu/145/head 2025-09-07T06:39:17.5437745Z * [new branch] gh/XilunWu/145/orig -> origin/gh/XilunWu/145/orig 2025-09-07T06:39:17.5437925Z * [new branch] gh/XilunWu/146/base -> origin/gh/XilunWu/146/base 2025-09-07T06:39:17.5438182Z * [new branch] gh/XilunWu/146/head -> origin/gh/XilunWu/146/head 2025-09-07T06:39:17.5438364Z * [new branch] gh/XilunWu/146/orig -> origin/gh/XilunWu/146/orig 2025-09-07T06:39:17.5438545Z * [new branch] gh/XilunWu/147/base -> origin/gh/XilunWu/147/base 2025-09-07T06:39:17.5438728Z * [new branch] gh/XilunWu/147/head -> origin/gh/XilunWu/147/head 2025-09-07T06:39:17.5440180Z * [new branch] gh/XilunWu/147/orig -> origin/gh/XilunWu/147/orig 2025-09-07T06:39:17.5440378Z * [new branch] gh/XilunWu/148/base -> origin/gh/XilunWu/148/base 2025-09-07T06:39:17.5440563Z * [new branch] gh/XilunWu/148/head -> origin/gh/XilunWu/148/head 2025-09-07T06:39:17.5440742Z * [new branch] gh/XilunWu/148/orig -> origin/gh/XilunWu/148/orig 2025-09-07T06:39:17.5440926Z * [new branch] gh/XilunWu/149/base -> origin/gh/XilunWu/149/base 2025-09-07T06:39:17.5441110Z * [new branch] gh/XilunWu/149/head -> origin/gh/XilunWu/149/head 2025-09-07T06:39:17.5441293Z * [new branch] gh/XilunWu/149/orig -> origin/gh/XilunWu/149/orig 2025-09-07T06:39:17.5441476Z * [new branch] gh/XilunWu/150/base -> origin/gh/XilunWu/150/base 2025-09-07T06:39:17.5441658Z * [new branch] gh/XilunWu/150/head -> origin/gh/XilunWu/150/head 2025-09-07T06:39:17.5441838Z * [new branch] gh/XilunWu/150/orig -> origin/gh/XilunWu/150/orig 2025-09-07T06:39:17.5442017Z * [new branch] gh/XilunWu/151/base -> origin/gh/XilunWu/151/base 2025-09-07T06:39:17.5442196Z * [new branch] gh/XilunWu/151/head -> origin/gh/XilunWu/151/head 2025-09-07T06:39:17.5443596Z * [new branch] gh/XilunWu/151/orig -> origin/gh/XilunWu/151/orig 2025-09-07T06:39:17.5443784Z * [new branch] gh/XilunWu/152/base -> origin/gh/XilunWu/152/base 2025-09-07T06:39:17.5443970Z * [new branch] gh/XilunWu/152/head -> origin/gh/XilunWu/152/head 2025-09-07T06:39:17.5444151Z * [new branch] gh/XilunWu/152/orig -> origin/gh/XilunWu/152/orig 2025-09-07T06:39:17.5444337Z * [new branch] gh/XilunWu/153/base -> origin/gh/XilunWu/153/base 2025-09-07T06:39:17.5444571Z * [new branch] gh/XilunWu/153/head -> origin/gh/XilunWu/153/head 2025-09-07T06:39:17.5444753Z * [new branch] gh/XilunWu/153/orig -> origin/gh/XilunWu/153/orig 2025-09-07T06:39:17.5444933Z * [new branch] gh/XilunWu/160/base -> origin/gh/XilunWu/160/base 2025-09-07T06:39:17.5445113Z * [new branch] gh/XilunWu/160/head -> origin/gh/XilunWu/160/head 2025-09-07T06:39:17.5445293Z * [new branch] gh/XilunWu/160/orig -> origin/gh/XilunWu/160/orig 2025-09-07T06:39:17.5445473Z * [new branch] gh/XilunWu/161/base -> origin/gh/XilunWu/161/base 2025-09-07T06:39:17.5445656Z * [new branch] gh/XilunWu/161/head -> origin/gh/XilunWu/161/head 2025-09-07T06:39:17.5445876Z * [new branch] gh/XilunWu/161/orig -> origin/gh/XilunWu/161/orig 2025-09-07T06:39:17.5446058Z * [new branch] gh/XilunWu/163/base -> origin/gh/XilunWu/163/base 2025-09-07T06:39:17.5446241Z * [new branch] gh/XilunWu/163/head -> origin/gh/XilunWu/163/head 2025-09-07T06:39:17.5446423Z * [new branch] gh/XilunWu/163/orig -> origin/gh/XilunWu/163/orig 2025-09-07T06:39:17.5446675Z * [new branch] gh/XilunWu/164/base -> origin/gh/XilunWu/164/base 2025-09-07T06:39:17.5446857Z * [new branch] gh/XilunWu/164/head -> origin/gh/XilunWu/164/head 2025-09-07T06:39:17.5447039Z * [new branch] gh/XilunWu/164/orig -> origin/gh/XilunWu/164/orig 2025-09-07T06:39:17.5447222Z * [new branch] gh/XilunWu/165/base -> origin/gh/XilunWu/165/base 2025-09-07T06:39:17.5447408Z * [new branch] gh/XilunWu/165/head -> origin/gh/XilunWu/165/head 2025-09-07T06:39:17.5447586Z * [new branch] gh/XilunWu/165/orig -> origin/gh/XilunWu/165/orig 2025-09-07T06:39:17.5449305Z * [new branch] gh/XilunWu/166/base -> origin/gh/XilunWu/166/base 2025-09-07T06:39:17.5449496Z * [new branch] gh/XilunWu/166/head -> origin/gh/XilunWu/166/head 2025-09-07T06:39:17.5449681Z * [new branch] gh/XilunWu/166/orig -> origin/gh/XilunWu/166/orig 2025-09-07T06:39:17.5449861Z * [new branch] gh/XilunWu/167/base -> origin/gh/XilunWu/167/base 2025-09-07T06:39:17.5450042Z * [new branch] gh/XilunWu/167/head -> origin/gh/XilunWu/167/head 2025-09-07T06:39:17.5450221Z * [new branch] gh/XilunWu/167/orig -> origin/gh/XilunWu/167/orig 2025-09-07T06:39:17.5450400Z * [new branch] gh/XilunWu/168/base -> origin/gh/XilunWu/168/base 2025-09-07T06:39:17.5450584Z * [new branch] gh/XilunWu/168/head -> origin/gh/XilunWu/168/head 2025-09-07T06:39:17.5450765Z * [new branch] gh/XilunWu/168/orig -> origin/gh/XilunWu/168/orig 2025-09-07T06:39:17.5450944Z * [new branch] gh/XilunWu/169/base -> origin/gh/XilunWu/169/base 2025-09-07T06:39:17.5451129Z * [new branch] gh/XilunWu/169/head -> origin/gh/XilunWu/169/head 2025-09-07T06:39:17.5451313Z * [new branch] gh/XilunWu/169/orig -> origin/gh/XilunWu/169/orig 2025-09-07T06:39:17.5451494Z * [new branch] gh/XilunWu/170/base -> origin/gh/XilunWu/170/base 2025-09-07T06:39:17.5451673Z * [new branch] gh/XilunWu/170/head -> origin/gh/XilunWu/170/head 2025-09-07T06:39:17.5451854Z * [new branch] gh/XilunWu/170/orig -> origin/gh/XilunWu/170/orig 2025-09-07T06:39:17.5452047Z * [new branch] gh/XuehaiPan/14/base -> origin/gh/XuehaiPan/14/base 2025-09-07T06:39:17.5452242Z * [new branch] gh/XuehaiPan/14/head -> origin/gh/XuehaiPan/14/head 2025-09-07T06:39:17.5453718Z * [new branch] gh/XuehaiPan/14/orig -> origin/gh/XuehaiPan/14/orig 2025-09-07T06:39:17.5453980Z * [new branch] gh/XuehaiPan/179/base -> origin/gh/XuehaiPan/179/base 2025-09-07T06:39:17.5454176Z * [new branch] gh/XuehaiPan/179/head -> origin/gh/XuehaiPan/179/head 2025-09-07T06:39:17.5454367Z * [new branch] gh/XuehaiPan/179/orig -> origin/gh/XuehaiPan/179/orig 2025-09-07T06:39:17.5454554Z * [new branch] gh/XuehaiPan/189/base -> origin/gh/XuehaiPan/189/base 2025-09-07T06:39:17.5454743Z * [new branch] gh/XuehaiPan/189/head -> origin/gh/XuehaiPan/189/head 2025-09-07T06:39:17.5454929Z * [new branch] gh/XuehaiPan/189/orig -> origin/gh/XuehaiPan/189/orig 2025-09-07T06:39:17.5455119Z * [new branch] gh/XuehaiPan/232/base -> origin/gh/XuehaiPan/232/base 2025-09-07T06:39:17.5455352Z * [new branch] gh/XuehaiPan/232/head -> origin/gh/XuehaiPan/232/head 2025-09-07T06:39:17.5455542Z * [new branch] gh/XuehaiPan/232/orig -> origin/gh/XuehaiPan/232/orig 2025-09-07T06:39:17.5455736Z * [new branch] gh/XuehaiPan/249/base -> origin/gh/XuehaiPan/249/base 2025-09-07T06:39:17.5455931Z * [new branch] gh/XuehaiPan/249/head -> origin/gh/XuehaiPan/249/head 2025-09-07T06:39:17.5456120Z * [new branch] gh/XuehaiPan/249/orig -> origin/gh/XuehaiPan/249/orig 2025-09-07T06:39:17.5456313Z * [new branch] gh/XuehaiPan/253/base -> origin/gh/XuehaiPan/253/base 2025-09-07T06:39:17.5456583Z * [new branch] gh/XuehaiPan/253/head -> origin/gh/XuehaiPan/253/head 2025-09-07T06:39:17.5456772Z * [new branch] gh/XuehaiPan/253/orig -> origin/gh/XuehaiPan/253/orig 2025-09-07T06:39:17.5456960Z * [new branch] gh/XuehaiPan/254/base -> origin/gh/XuehaiPan/254/base 2025-09-07T06:39:17.5457152Z * [new branch] gh/XuehaiPan/254/head -> origin/gh/XuehaiPan/254/head 2025-09-07T06:39:17.5457338Z * [new branch] gh/XuehaiPan/254/orig -> origin/gh/XuehaiPan/254/orig 2025-09-07T06:39:17.5457533Z * [new branch] gh/XuehaiPan/255/base -> origin/gh/XuehaiPan/255/base 2025-09-07T06:39:17.5457722Z * [new branch] gh/XuehaiPan/255/head -> origin/gh/XuehaiPan/255/head 2025-09-07T06:39:17.5457910Z * [new branch] gh/XuehaiPan/255/orig -> origin/gh/XuehaiPan/255/orig 2025-09-07T06:39:17.5458099Z * [new branch] gh/XuehaiPan/257/base -> origin/gh/XuehaiPan/257/base 2025-09-07T06:39:17.5458287Z * [new branch] gh/XuehaiPan/257/head -> origin/gh/XuehaiPan/257/head 2025-09-07T06:39:17.5458529Z * [new branch] gh/XuehaiPan/257/orig -> origin/gh/XuehaiPan/257/orig 2025-09-07T06:39:17.5458745Z * [new branch] gh/XuehaiPan/271/base -> origin/gh/XuehaiPan/271/base 2025-09-07T06:39:17.5459017Z * [new branch] gh/XuehaiPan/271/head -> origin/gh/XuehaiPan/271/head 2025-09-07T06:39:17.5460583Z * [new branch] gh/XuehaiPan/271/orig -> origin/gh/XuehaiPan/271/orig 2025-09-07T06:39:17.5460784Z * [new branch] gh/XuehaiPan/290/base -> origin/gh/XuehaiPan/290/base 2025-09-07T06:39:17.5460975Z * [new branch] gh/XuehaiPan/290/head -> origin/gh/XuehaiPan/290/head 2025-09-07T06:39:17.5461163Z * [new branch] gh/XuehaiPan/290/orig -> origin/gh/XuehaiPan/290/orig 2025-09-07T06:39:17.5461355Z * [new branch] gh/XuehaiPan/343/base -> origin/gh/XuehaiPan/343/base 2025-09-07T06:39:17.5461548Z * [new branch] gh/XuehaiPan/343/head -> origin/gh/XuehaiPan/343/head 2025-09-07T06:39:17.5461740Z * [new branch] gh/XuehaiPan/343/orig -> origin/gh/XuehaiPan/343/orig 2025-09-07T06:39:17.5461932Z * [new branch] gh/XuehaiPan/347/base -> origin/gh/XuehaiPan/347/base 2025-09-07T06:39:17.5462122Z * [new branch] gh/XuehaiPan/347/head -> origin/gh/XuehaiPan/347/head 2025-09-07T06:39:17.5462366Z * [new branch] gh/XuehaiPan/347/orig -> origin/gh/XuehaiPan/347/orig 2025-09-07T06:39:17.5462560Z * [new branch] gh/XuehaiPan/348/base -> origin/gh/XuehaiPan/348/base 2025-09-07T06:39:17.5462749Z * [new branch] gh/XuehaiPan/348/head -> origin/gh/XuehaiPan/348/head 2025-09-07T06:39:17.5462941Z * [new branch] gh/XuehaiPan/348/orig -> origin/gh/XuehaiPan/348/orig 2025-09-07T06:39:17.5463128Z * [new branch] gh/XuehaiPan/350/base -> origin/gh/XuehaiPan/350/base 2025-09-07T06:39:17.5464570Z * [new branch] gh/XuehaiPan/350/head -> origin/gh/XuehaiPan/350/head 2025-09-07T06:39:17.5464769Z * [new branch] gh/XuehaiPan/350/orig -> origin/gh/XuehaiPan/350/orig 2025-09-07T06:39:17.5465002Z * [new branch] gh/XuehaiPan/356/base -> origin/gh/XuehaiPan/356/base 2025-09-07T06:39:17.5465189Z * [new branch] gh/XuehaiPan/356/head -> origin/gh/XuehaiPan/356/head 2025-09-07T06:39:17.5465380Z * [new branch] gh/XuehaiPan/356/orig -> origin/gh/XuehaiPan/356/orig 2025-09-07T06:39:17.5465569Z * [new branch] gh/XuehaiPan/357/base -> origin/gh/XuehaiPan/357/base 2025-09-07T06:39:17.5465755Z * [new branch] gh/XuehaiPan/357/head -> origin/gh/XuehaiPan/357/head 2025-09-07T06:39:17.5465944Z * [new branch] gh/XuehaiPan/357/orig -> origin/gh/XuehaiPan/357/orig 2025-09-07T06:39:17.5466131Z * [new branch] gh/XuehaiPan/358/base -> origin/gh/XuehaiPan/358/base 2025-09-07T06:39:17.5466319Z * [new branch] gh/XuehaiPan/358/head -> origin/gh/XuehaiPan/358/head 2025-09-07T06:39:17.5466602Z * [new branch] gh/XuehaiPan/358/orig -> origin/gh/XuehaiPan/358/orig 2025-09-07T06:39:17.5468043Z * [new branch] gh/XuehaiPan/359/base -> origin/gh/XuehaiPan/359/base 2025-09-07T06:39:17.5468246Z * [new branch] gh/XuehaiPan/359/head -> origin/gh/XuehaiPan/359/head 2025-09-07T06:39:17.5468440Z * [new branch] gh/XuehaiPan/359/orig -> origin/gh/XuehaiPan/359/orig 2025-09-07T06:39:17.5468630Z * [new branch] gh/XuehaiPan/360/base -> origin/gh/XuehaiPan/360/base 2025-09-07T06:39:17.5468824Z * [new branch] gh/XuehaiPan/360/head -> origin/gh/XuehaiPan/360/head 2025-09-07T06:39:17.5469017Z * [new branch] gh/XuehaiPan/360/orig -> origin/gh/XuehaiPan/360/orig 2025-09-07T06:39:17.5469207Z * [new branch] gh/XuehaiPan/365/base -> origin/gh/XuehaiPan/365/base 2025-09-07T06:39:17.5469394Z * [new branch] gh/XuehaiPan/365/head -> origin/gh/XuehaiPan/365/head 2025-09-07T06:39:17.5469582Z * [new branch] gh/XuehaiPan/365/orig -> origin/gh/XuehaiPan/365/orig 2025-09-07T06:39:17.5469771Z * [new branch] gh/XuehaiPan/366/base -> origin/gh/XuehaiPan/366/base 2025-09-07T06:39:17.5469959Z * [new branch] gh/XuehaiPan/366/head -> origin/gh/XuehaiPan/366/head 2025-09-07T06:39:17.5470146Z * [new branch] gh/XuehaiPan/369/base -> origin/gh/XuehaiPan/369/base 2025-09-07T06:39:17.5470335Z * [new branch] gh/XuehaiPan/369/head -> origin/gh/XuehaiPan/369/head 2025-09-07T06:39:17.5470525Z * [new branch] gh/XuehaiPan/369/orig -> origin/gh/XuehaiPan/369/orig 2025-09-07T06:39:17.5470712Z * [new branch] gh/XuehaiPan/370/base -> origin/gh/XuehaiPan/370/base 2025-09-07T06:39:17.5470897Z * [new branch] gh/XuehaiPan/370/head -> origin/gh/XuehaiPan/370/head 2025-09-07T06:39:17.5471090Z * [new branch] gh/XuehaiPan/370/orig -> origin/gh/XuehaiPan/370/orig 2025-09-07T06:39:17.5471283Z * [new branch] gh/XuehaiPan/380/base -> origin/gh/XuehaiPan/380/base 2025-09-07T06:39:17.5472686Z * [new branch] gh/XuehaiPan/380/head -> origin/gh/XuehaiPan/380/head 2025-09-07T06:39:17.5472948Z * [new branch] gh/XuehaiPan/380/orig -> origin/gh/XuehaiPan/380/orig 2025-09-07T06:39:17.5473140Z * [new branch] gh/XuehaiPan/381/base -> origin/gh/XuehaiPan/381/base 2025-09-07T06:39:17.5473330Z * [new branch] gh/XuehaiPan/381/head -> origin/gh/XuehaiPan/381/head 2025-09-07T06:39:17.5473528Z * [new branch] gh/XuehaiPan/382/base -> origin/gh/XuehaiPan/382/base 2025-09-07T06:39:17.5473717Z * [new branch] gh/XuehaiPan/382/head -> origin/gh/XuehaiPan/382/head 2025-09-07T06:39:17.5473907Z * [new branch] gh/XuehaiPan/382/orig -> origin/gh/XuehaiPan/382/orig 2025-09-07T06:39:17.5474133Z * [new branch] gh/XuehaiPan/383/base -> origin/gh/XuehaiPan/383/base 2025-09-07T06:39:17.5474322Z * [new branch] gh/XuehaiPan/383/head -> origin/gh/XuehaiPan/383/head 2025-09-07T06:39:17.5474511Z * [new branch] gh/XuehaiPan/383/orig -> origin/gh/XuehaiPan/383/orig 2025-09-07T06:39:17.5474703Z * [new branch] gh/XuehaiPan/384/base -> origin/gh/XuehaiPan/384/base 2025-09-07T06:39:17.5474894Z * [new branch] gh/XuehaiPan/384/head -> origin/gh/XuehaiPan/384/head 2025-09-07T06:39:17.5476447Z * [new branch] gh/XuehaiPan/384/orig -> origin/gh/XuehaiPan/384/orig 2025-09-07T06:39:17.5476738Z * [new branch] gh/XuehaiPan/385/base -> origin/gh/XuehaiPan/385/base 2025-09-07T06:39:17.5476931Z * [new branch] gh/XuehaiPan/385/head -> origin/gh/XuehaiPan/385/head 2025-09-07T06:39:17.5477122Z * [new branch] gh/XuehaiPan/385/orig -> origin/gh/XuehaiPan/385/orig 2025-09-07T06:39:17.5477311Z * [new branch] gh/XuehaiPan/386/base -> origin/gh/XuehaiPan/386/base 2025-09-07T06:39:17.5477498Z * [new branch] gh/XuehaiPan/386/head -> origin/gh/XuehaiPan/386/head 2025-09-07T06:39:17.5477686Z * [new branch] gh/XuehaiPan/386/orig -> origin/gh/XuehaiPan/386/orig 2025-09-07T06:39:17.5477873Z * [new branch] gh/XuehaiPan/387/base -> origin/gh/XuehaiPan/387/base 2025-09-07T06:39:17.5478129Z * [new branch] gh/XuehaiPan/387/head -> origin/gh/XuehaiPan/387/head 2025-09-07T06:39:17.5478317Z * [new branch] gh/XuehaiPan/387/orig -> origin/gh/XuehaiPan/387/orig 2025-09-07T06:39:17.5478513Z * [new branch] gh/ZainRizvi/1/base -> origin/gh/ZainRizvi/1/base 2025-09-07T06:39:17.5478697Z * [new branch] gh/ZainRizvi/1/head -> origin/gh/ZainRizvi/1/head 2025-09-07T06:39:17.5478882Z * [new branch] gh/ZainRizvi/2/base -> origin/gh/ZainRizvi/2/base 2025-09-07T06:39:17.5479066Z * [new branch] gh/ZainRizvi/2/head -> origin/gh/ZainRizvi/2/head 2025-09-07T06:39:17.5479254Z * [new branch] gh/ZainRizvi/3/base -> origin/gh/ZainRizvi/3/base 2025-09-07T06:39:17.5479436Z * [new branch] gh/ZainRizvi/3/head -> origin/gh/ZainRizvi/3/head 2025-09-07T06:39:17.5479617Z * [new branch] gh/ZainRizvi/4/base -> origin/gh/ZainRizvi/4/base 2025-09-07T06:39:17.5479801Z * [new branch] gh/ZainRizvi/4/head -> origin/gh/ZainRizvi/4/head 2025-09-07T06:39:17.5479982Z * [new branch] gh/ZainRizvi/5/base -> origin/gh/ZainRizvi/5/base 2025-09-07T06:39:17.5480162Z * [new branch] gh/ZainRizvi/5/head -> origin/gh/ZainRizvi/5/head 2025-09-07T06:39:17.5480342Z * [new branch] gh/ZainRizvi/6/base -> origin/gh/ZainRizvi/6/base 2025-09-07T06:39:17.5480525Z * [new branch] gh/ZainRizvi/6/head -> origin/gh/ZainRizvi/6/head 2025-09-07T06:39:17.5480706Z * [new branch] gh/ZainRizvi/6/orig -> origin/gh/ZainRizvi/6/orig 2025-09-07T06:39:17.5482145Z * [new branch] gh/ZainRizvi/7/base -> origin/gh/ZainRizvi/7/base 2025-09-07T06:39:17.5482408Z * [new branch] gh/ZainRizvi/7/head -> origin/gh/ZainRizvi/7/head 2025-09-07T06:39:17.5482592Z * [new branch] gh/ZainRizvi/7/orig -> origin/gh/ZainRizvi/7/orig 2025-09-07T06:39:17.5482773Z * [new branch] gh/ZainRizvi/8/base -> origin/gh/ZainRizvi/8/base 2025-09-07T06:39:17.5482954Z * [new branch] gh/ZainRizvi/8/head -> origin/gh/ZainRizvi/8/head 2025-09-07T06:39:17.5483136Z * [new branch] gh/ZainRizvi/9/base -> origin/gh/ZainRizvi/9/base 2025-09-07T06:39:17.5483320Z * [new branch] gh/ZainRizvi/9/head -> origin/gh/ZainRizvi/9/head 2025-09-07T06:39:17.5483538Z * [new branch] gh/ZainRizvi/9/orig -> origin/gh/ZainRizvi/9/orig 2025-09-07T06:39:17.5483731Z * [new branch] gh/ZhiweiYan-96/39/base -> origin/gh/ZhiweiYan-96/39/base 2025-09-07T06:39:17.5483927Z * [new branch] gh/ZhiweiYan-96/39/head -> origin/gh/ZhiweiYan-96/39/head 2025-09-07T06:39:17.5484122Z * [new branch] gh/ZhiweiYan-96/39/orig -> origin/gh/ZhiweiYan-96/39/orig 2025-09-07T06:39:17.5484316Z * [new branch] gh/ZhiweiYan-96/44/base -> origin/gh/ZhiweiYan-96/44/base 2025-09-07T06:39:17.5485744Z * [new branch] gh/ZhiweiYan-96/44/head -> origin/gh/ZhiweiYan-96/44/head 2025-09-07T06:39:17.5485948Z * [new branch] gh/ZhiweiYan-96/45/base -> origin/gh/ZhiweiYan-96/45/base 2025-09-07T06:39:17.5486143Z * [new branch] gh/ZhiweiYan-96/45/head -> origin/gh/ZhiweiYan-96/45/head 2025-09-07T06:39:17.5486335Z * [new branch] gh/ZhiweiYan-96/49/base -> origin/gh/ZhiweiYan-96/49/base 2025-09-07T06:39:17.5486587Z * [new branch] gh/ZhiweiYan-96/49/head -> origin/gh/ZhiweiYan-96/49/head 2025-09-07T06:39:17.5486777Z * [new branch] gh/ZhiweiYan-96/62/base -> origin/gh/ZhiweiYan-96/62/base 2025-09-07T06:39:17.5486971Z * [new branch] gh/ZhiweiYan-96/62/head -> origin/gh/ZhiweiYan-96/62/head 2025-09-07T06:39:17.5487163Z * [new branch] gh/ZhiweiYan-96/64/base -> origin/gh/ZhiweiYan-96/64/base 2025-09-07T06:39:17.5487354Z * [new branch] gh/ZhiweiYan-96/64/head -> origin/gh/ZhiweiYan-96/64/head 2025-09-07T06:39:17.5487546Z * [new branch] gh/ZhiweiYan-96/64/orig -> origin/gh/ZhiweiYan-96/64/orig 2025-09-07T06:39:17.5487736Z * [new branch] gh/ZhiweiYan-96/65/base -> origin/gh/ZhiweiYan-96/65/base 2025-09-07T06:39:17.5489308Z * [new branch] gh/ZhiweiYan-96/65/head -> origin/gh/ZhiweiYan-96/65/head 2025-09-07T06:39:17.5489518Z * [new branch] gh/ZhiweiYan-96/65/orig -> origin/gh/ZhiweiYan-96/65/orig 2025-09-07T06:39:17.5489711Z * [new branch] gh/ZhiweiYan-96/66/base -> origin/gh/ZhiweiYan-96/66/base 2025-09-07T06:39:17.5489901Z * [new branch] gh/ZhiweiYan-96/66/head -> origin/gh/ZhiweiYan-96/66/head 2025-09-07T06:39:17.5490093Z * [new branch] gh/ZhiweiYan-96/67/base -> origin/gh/ZhiweiYan-96/67/base 2025-09-07T06:39:17.5490288Z * [new branch] gh/ZhiweiYan-96/67/head -> origin/gh/ZhiweiYan-96/67/head 2025-09-07T06:39:17.5490478Z * [new branch] gh/ZhiweiYan-96/68/base -> origin/gh/ZhiweiYan-96/68/base 2025-09-07T06:39:17.5490666Z * [new branch] gh/ZhiweiYan-96/68/head -> origin/gh/ZhiweiYan-96/68/head 2025-09-07T06:39:17.5490857Z * [new branch] gh/ZhiweiYan-96/68/orig -> origin/gh/ZhiweiYan-96/68/orig 2025-09-07T06:39:17.5491043Z * [new branch] gh/aakhundov/1/base -> origin/gh/aakhundov/1/base 2025-09-07T06:39:17.5491231Z * [new branch] gh/aakhundov/1/head -> origin/gh/aakhundov/1/head 2025-09-07T06:39:17.5491417Z * [new branch] gh/aakhundov/2/base -> origin/gh/aakhundov/2/base 2025-09-07T06:39:17.5491658Z * [new branch] gh/aakhundov/2/head -> origin/gh/aakhundov/2/head 2025-09-07T06:39:17.5491848Z * [new branch] gh/aditew01/openblas -> origin/gh/aditew01/openblas 2025-09-07T06:39:17.5492037Z * [new branch] gh/aditew01/sbgemm -> origin/gh/aditew01/sbgemm 2025-09-07T06:39:17.5492225Z * [new branch] gh/aditew01/vecbf16 -> origin/gh/aditew01/vecbf16 2025-09-07T06:39:17.5492495Z * [new branch] gh/alexbrauckmann/paddedtensor_faketensor_init -> origin/gh/alexbrauckmann/paddedtensor_faketensor_init 2025-09-07T06:39:17.5492763Z * [new branch] gh/alexsamardzic/9/base -> origin/gh/alexsamardzic/9/base 2025-09-07T06:39:17.5492997Z * [new branch] gh/alexsamardzic/9/head -> origin/gh/alexsamardzic/9/head 2025-09-07T06:39:17.5494432Z * [new branch] gh/alexsamardzic/9/orig -> origin/gh/alexsamardzic/9/orig 2025-09-07T06:39:17.5494631Z * [new branch] gh/amjames/18/base -> origin/gh/amjames/18/base 2025-09-07T06:39:17.5494815Z * [new branch] gh/amjames/18/head -> origin/gh/amjames/18/head 2025-09-07T06:39:17.5494998Z * [new branch] gh/amjames/18/orig -> origin/gh/amjames/18/orig 2025-09-07T06:39:17.5495187Z * [new branch] gh/andrewor14/35/base -> origin/gh/andrewor14/35/base 2025-09-07T06:39:17.5495375Z * [new branch] gh/andrewor14/35/head -> origin/gh/andrewor14/35/head 2025-09-07T06:39:17.5495561Z * [new branch] gh/andrewor14/35/orig -> origin/gh/andrewor14/35/orig 2025-09-07T06:39:17.5495747Z * [new branch] gh/andrewor14/50/base -> origin/gh/andrewor14/50/base 2025-09-07T06:39:17.5495932Z * [new branch] gh/andrewor14/50/head -> origin/gh/andrewor14/50/head 2025-09-07T06:39:17.5496121Z * [new branch] gh/andrewor14/50/orig -> origin/gh/andrewor14/50/orig 2025-09-07T06:39:17.5496308Z * [new branch] gh/andrewor14/51/base -> origin/gh/andrewor14/51/base 2025-09-07T06:39:17.5497899Z * [new branch] gh/andrewor14/51/orig -> origin/gh/andrewor14/51/orig 2025-09-07T06:39:17.5498110Z * [new branch] gh/andyanwang/1/base -> origin/gh/andyanwang/1/base 2025-09-07T06:39:17.5498304Z * [new branch] gh/andyanwang/1/head -> origin/gh/andyanwang/1/head 2025-09-07T06:39:17.5498490Z * [new branch] gh/andyanwang/1/orig -> origin/gh/andyanwang/1/orig 2025-09-07T06:39:17.5498681Z * [new branch] gh/andyanwang/13/base -> origin/gh/andyanwang/13/base 2025-09-07T06:39:17.5498871Z * [new branch] gh/andyanwang/13/head -> origin/gh/andyanwang/13/head 2025-09-07T06:39:17.5499059Z * [new branch] gh/andyanwang/13/orig -> origin/gh/andyanwang/13/orig 2025-09-07T06:39:17.5499248Z * [new branch] gh/andyanwang/2/base -> origin/gh/andyanwang/2/base 2025-09-07T06:39:17.5499437Z * [new branch] gh/andyanwang/2/head -> origin/gh/andyanwang/2/head 2025-09-07T06:39:17.5499625Z * [new branch] gh/andyanwang/2/orig -> origin/gh/andyanwang/2/orig 2025-09-07T06:39:17.5499814Z * [new branch] gh/andyanwang/28/base -> origin/gh/andyanwang/28/base 2025-09-07T06:39:17.5500003Z * [new branch] gh/andyanwang/28/head -> origin/gh/andyanwang/28/head 2025-09-07T06:39:17.5500195Z * [new branch] gh/andyanwang/28/orig -> origin/gh/andyanwang/28/orig 2025-09-07T06:39:17.5500381Z * [new branch] gh/andyanwang/3/base -> origin/gh/andyanwang/3/base 2025-09-07T06:39:17.5500571Z * [new branch] gh/andyanwang/3/head -> origin/gh/andyanwang/3/head 2025-09-07T06:39:17.5500761Z * [new branch] gh/andyanwang/3/orig -> origin/gh/andyanwang/3/orig 2025-09-07T06:39:17.5500951Z * [new branch] gh/andyanwang/30/base -> origin/gh/andyanwang/30/base 2025-09-07T06:39:17.5501189Z * [new branch] gh/andyanwang/30/orig -> origin/gh/andyanwang/30/orig 2025-09-07T06:39:17.5501377Z * [new branch] gh/andyanwang/31/base -> origin/gh/andyanwang/31/base 2025-09-07T06:39:17.5501565Z * [new branch] gh/andyanwang/31/orig -> origin/gh/andyanwang/31/orig 2025-09-07T06:39:17.5503139Z * [new branch] gh/andyanwang/32/base -> origin/gh/andyanwang/32/base 2025-09-07T06:39:17.5503337Z * [new branch] gh/andyanwang/32/head -> origin/gh/andyanwang/32/head 2025-09-07T06:39:17.5503529Z * [new branch] gh/andyanwang/32/orig -> origin/gh/andyanwang/32/orig 2025-09-07T06:39:17.5503765Z * [new branch] gh/andyanwang/39/base -> origin/gh/andyanwang/39/base 2025-09-07T06:39:17.5503950Z * [new branch] gh/andyanwang/39/head -> origin/gh/andyanwang/39/head 2025-09-07T06:39:17.5504146Z * [new branch] gh/andyanwang/39/orig -> origin/gh/andyanwang/39/orig 2025-09-07T06:39:17.5504333Z * [new branch] gh/andyanwang/4/base -> origin/gh/andyanwang/4/base 2025-09-07T06:39:17.5504520Z * [new branch] gh/andyanwang/4/head -> origin/gh/andyanwang/4/head 2025-09-07T06:39:17.5504707Z * [new branch] gh/andyanwang/4/orig -> origin/gh/andyanwang/4/orig 2025-09-07T06:39:17.5504893Z * [new branch] gh/angelayi/107/base -> origin/gh/angelayi/107/base 2025-09-07T06:39:17.5505079Z * [new branch] gh/angelayi/107/head -> origin/gh/angelayi/107/head 2025-09-07T06:39:17.5505263Z * [new branch] gh/angelayi/111/base -> origin/gh/angelayi/111/base 2025-09-07T06:39:17.5506801Z * [new branch] gh/angelayi/111/head -> origin/gh/angelayi/111/head 2025-09-07T06:39:17.5506995Z * [new branch] gh/angelayi/111/orig -> origin/gh/angelayi/111/orig 2025-09-07T06:39:17.5507181Z * [new branch] gh/angelayi/112/base -> origin/gh/angelayi/112/base 2025-09-07T06:39:17.5507367Z * [new branch] gh/angelayi/112/head -> origin/gh/angelayi/112/head 2025-09-07T06:39:17.5507552Z * [new branch] gh/angelayi/112/orig -> origin/gh/angelayi/112/orig 2025-09-07T06:39:17.5507742Z * [new branch] gh/angelayi/113/base -> origin/gh/angelayi/113/base 2025-09-07T06:39:17.5507928Z * [new branch] gh/angelayi/113/head -> origin/gh/angelayi/113/head 2025-09-07T06:39:17.5508111Z * [new branch] gh/angelayi/113/orig -> origin/gh/angelayi/113/orig 2025-09-07T06:39:17.5508296Z * [new branch] gh/angelayi/114/base -> origin/gh/angelayi/114/base 2025-09-07T06:39:17.5508480Z * [new branch] gh/angelayi/114/head -> origin/gh/angelayi/114/head 2025-09-07T06:39:17.5508664Z * [new branch] gh/angelayi/114/orig -> origin/gh/angelayi/114/orig 2025-09-07T06:39:17.5508850Z * [new branch] gh/angelayi/115/base -> origin/gh/angelayi/115/base 2025-09-07T06:39:17.5509036Z * [new branch] gh/angelayi/115/head -> origin/gh/angelayi/115/head 2025-09-07T06:39:17.5509220Z * [new branch] gh/angelayi/115/orig -> origin/gh/angelayi/115/orig 2025-09-07T06:39:17.5509416Z * [new branch] gh/anijain2305/753/base -> origin/gh/anijain2305/753/base 2025-09-07T06:39:17.5509611Z * [new branch] gh/anijain2305/753/head -> origin/gh/anijain2305/753/head 2025-09-07T06:39:17.5509804Z * [new branch] gh/anijain2305/753/orig -> origin/gh/anijain2305/753/orig 2025-09-07T06:39:17.5510003Z * [new branch] gh/anijain2305/766/base -> origin/gh/anijain2305/766/base 2025-09-07T06:39:17.5510194Z * [new branch] gh/anijain2305/766/head -> origin/gh/anijain2305/766/head 2025-09-07T06:39:17.5510384Z * [new branch] gh/anijain2305/766/orig -> origin/gh/anijain2305/766/orig 2025-09-07T06:39:17.5510623Z * [new branch] gh/anijain2305/790/base -> origin/gh/anijain2305/790/base 2025-09-07T06:39:17.5510813Z * [new branch] gh/anijain2305/790/head -> origin/gh/anijain2305/790/head 2025-09-07T06:39:17.5512278Z * [new branch] gh/anijain2305/790/orig -> origin/gh/anijain2305/790/orig 2025-09-07T06:39:17.5512487Z * [new branch] gh/anijain2305/792/base -> origin/gh/anijain2305/792/base 2025-09-07T06:39:17.5512679Z * [new branch] gh/anijain2305/792/head -> origin/gh/anijain2305/792/head 2025-09-07T06:39:17.5512869Z * [new branch] gh/anijain2305/792/orig -> origin/gh/anijain2305/792/orig 2025-09-07T06:39:17.5513110Z * [new branch] gh/anijain2305/803/base -> origin/gh/anijain2305/803/base 2025-09-07T06:39:17.5513301Z * [new branch] gh/anijain2305/803/head -> origin/gh/anijain2305/803/head 2025-09-07T06:39:17.5513498Z * [new branch] gh/anijain2305/803/orig -> origin/gh/anijain2305/803/orig 2025-09-07T06:39:17.5513688Z * [new branch] gh/anijain2305/804/base -> origin/gh/anijain2305/804/base 2025-09-07T06:39:17.5513880Z * [new branch] gh/anijain2305/804/head -> origin/gh/anijain2305/804/head 2025-09-07T06:39:17.5514069Z * [new branch] gh/anijain2305/804/orig -> origin/gh/anijain2305/804/orig 2025-09-07T06:39:17.5514260Z * [new branch] gh/anijain2305/805/base -> origin/gh/anijain2305/805/base 2025-09-07T06:39:17.5514452Z * [new branch] gh/anijain2305/805/head -> origin/gh/anijain2305/805/head 2025-09-07T06:39:17.5515844Z * [new branch] gh/anijain2305/805/orig -> origin/gh/anijain2305/805/orig 2025-09-07T06:39:17.5516046Z * [new branch] gh/anijain2305/810/base -> origin/gh/anijain2305/810/base 2025-09-07T06:39:17.5516238Z * [new branch] gh/anijain2305/810/head -> origin/gh/anijain2305/810/head 2025-09-07T06:39:17.5516434Z * [new branch] gh/anijain2305/810/orig -> origin/gh/anijain2305/810/orig 2025-09-07T06:39:17.5516705Z * [new branch] gh/anijain2305/812/base -> origin/gh/anijain2305/812/base 2025-09-07T06:39:17.5516894Z * [new branch] gh/anijain2305/812/head -> origin/gh/anijain2305/812/head 2025-09-07T06:39:17.5517084Z * [new branch] gh/anijain2305/812/orig -> origin/gh/anijain2305/812/orig 2025-09-07T06:39:17.5517282Z * [new branch] gh/anijain2305/838/base -> origin/gh/anijain2305/838/base 2025-09-07T06:39:17.5517473Z * [new branch] gh/anijain2305/838/head -> origin/gh/anijain2305/838/head 2025-09-07T06:39:17.5517665Z * [new branch] gh/anijain2305/838/orig -> origin/gh/anijain2305/838/orig 2025-09-07T06:39:17.5517856Z * [new branch] gh/anijain2305/839/base -> origin/gh/anijain2305/839/base 2025-09-07T06:39:17.5519324Z * [new branch] gh/anijain2305/839/head -> origin/gh/anijain2305/839/head 2025-09-07T06:39:17.5519529Z * [new branch] gh/anijain2305/839/orig -> origin/gh/anijain2305/839/orig 2025-09-07T06:39:17.5519724Z * [new branch] gh/anijain2305/843/base -> origin/gh/anijain2305/843/base 2025-09-07T06:39:17.5519917Z * [new branch] gh/anijain2305/843/head -> origin/gh/anijain2305/843/head 2025-09-07T06:39:17.5520107Z * [new branch] gh/anijain2305/843/orig -> origin/gh/anijain2305/843/orig 2025-09-07T06:39:17.5520299Z * [new branch] gh/anijain2305/844/base -> origin/gh/anijain2305/844/base 2025-09-07T06:39:17.5520493Z * [new branch] gh/anijain2305/844/head -> origin/gh/anijain2305/844/head 2025-09-07T06:39:17.5520685Z * [new branch] gh/anijain2305/844/orig -> origin/gh/anijain2305/844/orig 2025-09-07T06:39:17.5520880Z * [new branch] gh/anijain2305/846/base -> origin/gh/anijain2305/846/base 2025-09-07T06:39:17.5521147Z * [new branch] gh/anijain2305/846/head -> origin/gh/anijain2305/846/head 2025-09-07T06:39:17.5521342Z * [new branch] gh/anijain2305/846/orig -> origin/gh/anijain2305/846/orig 2025-09-07T06:39:17.5521531Z * [new branch] gh/anijain2305/848/base -> origin/gh/anijain2305/848/base 2025-09-07T06:39:17.5521723Z * [new branch] gh/anijain2305/848/head -> origin/gh/anijain2305/848/head 2025-09-07T06:39:17.5521915Z * [new branch] gh/anijain2305/848/orig -> origin/gh/anijain2305/848/orig 2025-09-07T06:39:17.5522103Z * [new branch] gh/anijain2305/849/base -> origin/gh/anijain2305/849/base 2025-09-07T06:39:17.5522326Z * [new branch] gh/anijain2305/849/head -> origin/gh/anijain2305/849/head 2025-09-07T06:39:17.5522516Z * [new branch] gh/anijain2305/849/orig -> origin/gh/anijain2305/849/orig 2025-09-07T06:39:17.5522707Z * [new branch] gh/anijain2305/850/base -> origin/gh/anijain2305/850/base 2025-09-07T06:39:17.5524159Z * [new branch] gh/anijain2305/850/head -> origin/gh/anijain2305/850/head 2025-09-07T06:39:17.5524357Z * [new branch] gh/anijain2305/850/orig -> origin/gh/anijain2305/850/orig 2025-09-07T06:39:17.5524549Z * [new branch] gh/anijain2305/851/base -> origin/gh/anijain2305/851/base 2025-09-07T06:39:17.5524739Z * [new branch] gh/anijain2305/851/head -> origin/gh/anijain2305/851/head 2025-09-07T06:39:17.5524931Z * [new branch] gh/anijain2305/851/orig -> origin/gh/anijain2305/851/orig 2025-09-07T06:39:17.5525125Z * [new branch] gh/anijain2305/852/base -> origin/gh/anijain2305/852/base 2025-09-07T06:39:17.5525320Z * [new branch] gh/anijain2305/852/head -> origin/gh/anijain2305/852/head 2025-09-07T06:39:17.5525508Z * [new branch] gh/anijain2305/852/orig -> origin/gh/anijain2305/852/orig 2025-09-07T06:39:17.5525705Z * [new branch] gh/anijain2305/853/base -> origin/gh/anijain2305/853/base 2025-09-07T06:39:17.5525894Z * [new branch] gh/anijain2305/853/head -> origin/gh/anijain2305/853/head 2025-09-07T06:39:17.5527445Z * [new branch] gh/anijain2305/853/orig -> origin/gh/anijain2305/853/orig 2025-09-07T06:39:17.5527645Z * [new branch] gh/anijain2305/854/base -> origin/gh/anijain2305/854/base 2025-09-07T06:39:17.5527839Z * [new branch] gh/anijain2305/854/head -> origin/gh/anijain2305/854/head 2025-09-07T06:39:17.5528036Z * [new branch] gh/anijain2305/854/orig -> origin/gh/anijain2305/854/orig 2025-09-07T06:39:17.5528226Z * [new branch] gh/anijain2305/855/base -> origin/gh/anijain2305/855/base 2025-09-07T06:39:17.5528421Z * [new branch] gh/anijain2305/855/head -> origin/gh/anijain2305/855/head 2025-09-07T06:39:17.5528612Z * [new branch] gh/anijain2305/855/orig -> origin/gh/anijain2305/855/orig 2025-09-07T06:39:17.5528803Z * [new branch] gh/anijain2305/856/base -> origin/gh/anijain2305/856/base 2025-09-07T06:39:17.5528995Z * [new branch] gh/anijain2305/856/head -> origin/gh/anijain2305/856/head 2025-09-07T06:39:17.5529188Z * [new branch] gh/anijain2305/856/orig -> origin/gh/anijain2305/856/orig 2025-09-07T06:39:17.5529378Z * [new branch] gh/anijain2305/857/base -> origin/gh/anijain2305/857/base 2025-09-07T06:39:17.5529570Z * [new branch] gh/anijain2305/857/head -> origin/gh/anijain2305/857/head 2025-09-07T06:39:17.5529762Z * [new branch] gh/anijain2305/857/orig -> origin/gh/anijain2305/857/orig 2025-09-07T06:39:17.5529955Z * [new branch] gh/anijain2305/858/base -> origin/gh/anijain2305/858/base 2025-09-07T06:39:17.5530144Z * [new branch] gh/anijain2305/858/head -> origin/gh/anijain2305/858/head 2025-09-07T06:39:17.5530389Z * [new branch] gh/anijain2305/858/orig -> origin/gh/anijain2305/858/orig 2025-09-07T06:39:17.5530581Z * [new branch] gh/anijain2305/859/base -> origin/gh/anijain2305/859/base 2025-09-07T06:39:17.5530776Z * [new branch] gh/anijain2305/859/head -> origin/gh/anijain2305/859/head 2025-09-07T06:39:17.5530970Z * [new branch] gh/anijain2305/859/orig -> origin/gh/anijain2305/859/orig 2025-09-07T06:39:17.5531162Z * [new branch] gh/anijain2305/860/base -> origin/gh/anijain2305/860/base 2025-09-07T06:39:17.5531353Z * [new branch] gh/anijain2305/860/head -> origin/gh/anijain2305/860/head 2025-09-07T06:39:17.5531587Z * [new branch] gh/anijain2305/860/orig -> origin/gh/anijain2305/860/orig 2025-09-07T06:39:17.5531781Z * [new branch] gh/anijain2305/861/base -> origin/gh/anijain2305/861/base 2025-09-07T06:39:17.5531977Z * [new branch] gh/anijain2305/861/head -> origin/gh/anijain2305/861/head 2025-09-07T06:39:17.5532168Z * [new branch] gh/anijain2305/861/orig -> origin/gh/anijain2305/861/orig 2025-09-07T06:39:17.5532361Z * [new branch] gh/anijain2305/862/base -> origin/gh/anijain2305/862/base 2025-09-07T06:39:17.5532551Z * [new branch] gh/anijain2305/862/head -> origin/gh/anijain2305/862/head 2025-09-07T06:39:17.5532746Z * [new branch] gh/anijain2305/862/orig -> origin/gh/anijain2305/862/orig 2025-09-07T06:39:17.5532981Z * [new branch] gh/anijain2305/863/base -> origin/gh/anijain2305/863/base 2025-09-07T06:39:17.5533251Z * [new branch] gh/anijain2305/863/head -> origin/gh/anijain2305/863/head 2025-09-07T06:39:17.5533489Z * [new branch] gh/anijain2305/863/orig -> origin/gh/anijain2305/863/orig 2025-09-07T06:39:17.5533706Z * [new branch] gh/anijain2305/864/base -> origin/gh/anijain2305/864/base 2025-09-07T06:39:17.5533950Z * [new branch] gh/anijain2305/864/head -> origin/gh/anijain2305/864/head 2025-09-07T06:39:17.5534172Z * [new branch] gh/anijain2305/864/orig -> origin/gh/anijain2305/864/orig 2025-09-07T06:39:17.5534398Z * [new branch] gh/anijain2305/865/base -> origin/gh/anijain2305/865/base 2025-09-07T06:39:17.5534636Z * [new branch] gh/anijain2305/865/head -> origin/gh/anijain2305/865/head 2025-09-07T06:39:17.5534854Z * [new branch] gh/anijain2305/865/orig -> origin/gh/anijain2305/865/orig 2025-09-07T06:39:17.5535102Z * [new branch] gh/anijain2305/866/base -> origin/gh/anijain2305/866/base 2025-09-07T06:39:17.5535329Z * [new branch] gh/anijain2305/866/head -> origin/gh/anijain2305/866/head 2025-09-07T06:39:17.5535567Z * [new branch] gh/anijain2305/866/orig -> origin/gh/anijain2305/866/orig 2025-09-07T06:39:17.5535798Z * [new branch] gh/anjali411/216/base -> origin/gh/anjali411/216/base 2025-09-07T06:39:17.5536115Z * [new branch] gh/anjali411/216/head -> origin/gh/anjali411/216/head 2025-09-07T06:39:17.5536347Z * [new branch] gh/anjali411/216/orig -> origin/gh/anjali411/216/orig 2025-09-07T06:39:17.5536669Z * [new branch] gh/ankitageorge/13/base -> origin/gh/ankitageorge/13/base 2025-09-07T06:39:17.5536900Z * [new branch] gh/ankitageorge/13/head -> origin/gh/ankitageorge/13/head 2025-09-07T06:39:17.5537135Z * [new branch] gh/ankitageorge/13/orig -> origin/gh/ankitageorge/13/orig 2025-09-07T06:39:17.5538737Z * [new branch] gh/ankitageorge/14/base -> origin/gh/ankitageorge/14/base 2025-09-07T06:39:17.5538938Z * [new branch] gh/ankitageorge/14/head -> origin/gh/ankitageorge/14/head 2025-09-07T06:39:17.5539134Z * [new branch] gh/ankitageorge/14/orig -> origin/gh/ankitageorge/14/orig 2025-09-07T06:39:17.5539393Z * [new branch] gh/ankitageorge/15/base -> origin/gh/ankitageorge/15/base 2025-09-07T06:39:17.5539590Z * [new branch] gh/ankitageorge/15/head -> origin/gh/ankitageorge/15/head 2025-09-07T06:39:17.5539788Z * [new branch] gh/ankitageorge/15/orig -> origin/gh/ankitageorge/15/orig 2025-09-07T06:39:17.5539981Z * [new branch] gh/ankitageorge/16/base -> origin/gh/ankitageorge/16/base 2025-09-07T06:39:17.5540176Z * [new branch] gh/ankitageorge/16/head -> origin/gh/ankitageorge/16/head 2025-09-07T06:39:17.5540370Z * [new branch] gh/ankitageorge/16/orig -> origin/gh/ankitageorge/16/orig 2025-09-07T06:39:17.5540729Z * [new branch] gh/ankitageorge/17/base -> origin/gh/ankitageorge/17/base 2025-09-07T06:39:17.5540924Z * [new branch] gh/ankitageorge/17/head -> origin/gh/ankitageorge/17/head 2025-09-07T06:39:17.5541119Z * [new branch] gh/ankitageorge/17/orig -> origin/gh/ankitageorge/17/orig 2025-09-07T06:39:17.5541320Z * [new branch] gh/ankitageorge/21/base -> origin/gh/ankitageorge/21/base 2025-09-07T06:39:17.5541518Z * [new branch] gh/ankitageorge/21/head -> origin/gh/ankitageorge/21/head 2025-09-07T06:39:17.5541712Z * [new branch] gh/ankitageorge/21/orig -> origin/gh/ankitageorge/21/orig 2025-09-07T06:39:17.5541910Z * [new branch] gh/anshul-si/1/base -> origin/gh/anshul-si/1/base 2025-09-07T06:39:17.5542094Z * [new branch] gh/anshul-si/1/head -> origin/gh/anshul-si/1/head 2025-09-07T06:39:17.5542280Z * [new branch] gh/anshul-si/15/base -> origin/gh/anshul-si/15/base 2025-09-07T06:39:17.5542464Z * [new branch] gh/anshul-si/15/head -> origin/gh/anshul-si/15/head 2025-09-07T06:39:17.5542645Z * [new branch] gh/anshul-si/15/orig -> origin/gh/anshul-si/15/orig 2025-09-07T06:39:17.5542826Z * [new branch] gh/anshul-si/16/base -> origin/gh/anshul-si/16/base 2025-09-07T06:39:17.5543007Z * [new branch] gh/anshul-si/16/head -> origin/gh/anshul-si/16/head 2025-09-07T06:39:17.5543189Z * [new branch] gh/anshul-si/16/orig -> origin/gh/anshul-si/16/orig 2025-09-07T06:39:17.5543369Z * [new branch] gh/anshul-si/17/base -> origin/gh/anshul-si/17/base 2025-09-07T06:39:17.5543548Z * [new branch] gh/anshul-si/17/head -> origin/gh/anshul-si/17/head 2025-09-07T06:39:17.5543727Z * [new branch] gh/anshul-si/17/orig -> origin/gh/anshul-si/17/orig 2025-09-07T06:39:17.5543910Z * [new branch] gh/anshul-si/18/base -> origin/gh/anshul-si/18/base 2025-09-07T06:39:17.5544093Z * [new branch] gh/anshul-si/18/head -> origin/gh/anshul-si/18/head 2025-09-07T06:39:17.5544272Z * [new branch] gh/anshul-si/18/orig -> origin/gh/anshul-si/18/orig 2025-09-07T06:39:17.5544453Z * [new branch] gh/anshul-si/19/base -> origin/gh/anshul-si/19/base 2025-09-07T06:39:17.5544632Z * [new branch] gh/anshul-si/19/head -> origin/gh/anshul-si/19/head 2025-09-07T06:39:17.5546095Z * [new branch] gh/anshul-si/19/orig -> origin/gh/anshul-si/19/orig 2025-09-07T06:39:17.5546282Z * [new branch] gh/anshul-si/2/base -> origin/gh/anshul-si/2/base 2025-09-07T06:39:17.5546463Z * [new branch] gh/anshul-si/2/head -> origin/gh/anshul-si/2/head 2025-09-07T06:39:17.5546720Z * [new branch] gh/anshul-si/20/base -> origin/gh/anshul-si/20/base 2025-09-07T06:39:17.5546905Z * [new branch] gh/anshul-si/20/head -> origin/gh/anshul-si/20/head 2025-09-07T06:39:17.5547090Z * [new branch] gh/anshul-si/20/orig -> origin/gh/anshul-si/20/orig 2025-09-07T06:39:17.5547268Z * [new branch] gh/anshul-si/21/base -> origin/gh/anshul-si/21/base 2025-09-07T06:39:17.5547505Z * [new branch] gh/anshul-si/21/head -> origin/gh/anshul-si/21/head 2025-09-07T06:39:17.5547688Z * [new branch] gh/anshul-si/21/orig -> origin/gh/anshul-si/21/orig 2025-09-07T06:39:17.5547867Z * [new branch] gh/anshul-si/22/base -> origin/gh/anshul-si/22/base 2025-09-07T06:39:17.5548046Z * [new branch] gh/anshul-si/22/head -> origin/gh/anshul-si/22/head 2025-09-07T06:39:17.5548229Z * [new branch] gh/anshul-si/22/orig -> origin/gh/anshul-si/22/orig 2025-09-07T06:39:17.5548409Z * [new branch] gh/anshul-si/23/base -> origin/gh/anshul-si/23/base 2025-09-07T06:39:17.5549805Z * [new branch] gh/anshul-si/23/head -> origin/gh/anshul-si/23/head 2025-09-07T06:39:17.5549990Z * [new branch] gh/anshul-si/23/orig -> origin/gh/anshul-si/23/orig 2025-09-07T06:39:17.5550171Z * [new branch] gh/anshul-si/24/base -> origin/gh/anshul-si/24/base 2025-09-07T06:39:17.5550353Z * [new branch] gh/anshul-si/24/head -> origin/gh/anshul-si/24/head 2025-09-07T06:39:17.5550530Z * [new branch] gh/anshul-si/24/orig -> origin/gh/anshul-si/24/orig 2025-09-07T06:39:17.5550709Z * [new branch] gh/anshul-si/25/base -> origin/gh/anshul-si/25/base 2025-09-07T06:39:17.5550889Z * [new branch] gh/anshul-si/25/head -> origin/gh/anshul-si/25/head 2025-09-07T06:39:17.5551068Z * [new branch] gh/anshul-si/25/orig -> origin/gh/anshul-si/25/orig 2025-09-07T06:39:17.5551251Z * [new branch] gh/anshul-si/26/base -> origin/gh/anshul-si/26/base 2025-09-07T06:39:17.5551432Z * [new branch] gh/anshul-si/26/head -> origin/gh/anshul-si/26/head 2025-09-07T06:39:17.5551614Z * [new branch] gh/anshul-si/26/orig -> origin/gh/anshul-si/26/orig 2025-09-07T06:39:17.5552986Z * [new branch] gh/anshul-si/27/base -> origin/gh/anshul-si/27/base 2025-09-07T06:39:17.5553184Z * [new branch] gh/anshul-si/27/head -> origin/gh/anshul-si/27/head 2025-09-07T06:39:17.5553368Z * [new branch] gh/anshul-si/27/orig -> origin/gh/anshul-si/27/orig 2025-09-07T06:39:17.5553547Z * [new branch] gh/anshul-si/28/base -> origin/gh/anshul-si/28/base 2025-09-07T06:39:17.5553623Z * [new branch] gh/anshul-si/28/head -> origin/gh/anshul-si/28/head 2025-09-07T06:39:17.5553695Z * [new branch] gh/anshul-si/28/orig -> origin/gh/anshul-si/28/orig 2025-09-07T06:39:17.5553766Z * [new branch] gh/anshul-si/29/base -> origin/gh/anshul-si/29/base 2025-09-07T06:39:17.5553840Z * [new branch] gh/anshul-si/29/head -> origin/gh/anshul-si/29/head 2025-09-07T06:39:17.5553911Z * [new branch] gh/anshul-si/29/orig -> origin/gh/anshul-si/29/orig 2025-09-07T06:39:17.5553986Z * [new branch] gh/anshul-si/3/base -> origin/gh/anshul-si/3/base 2025-09-07T06:39:17.5554060Z * [new branch] gh/anshul-si/3/head -> origin/gh/anshul-si/3/head 2025-09-07T06:39:17.5554134Z * [new branch] gh/anshul-si/4/base -> origin/gh/anshul-si/4/base 2025-09-07T06:39:17.5554206Z * [new branch] gh/anshul-si/4/head -> origin/gh/anshul-si/4/head 2025-09-07T06:39:17.5554278Z * [new branch] gh/anshul-si/5/base -> origin/gh/anshul-si/5/base 2025-09-07T06:39:17.5554349Z * [new branch] gh/anshul-si/5/head -> origin/gh/anshul-si/5/head 2025-09-07T06:39:17.5554428Z * [new branch] gh/aorenste/132/base -> origin/gh/aorenste/132/base 2025-09-07T06:39:17.5554507Z * [new branch] gh/aorenste/132/head -> origin/gh/aorenste/132/head 2025-09-07T06:39:17.5554582Z * [new branch] gh/bdhirsh/650/base -> origin/gh/bdhirsh/650/base 2025-09-07T06:39:17.5554697Z * [new branch] gh/bdhirsh/650/head -> origin/gh/bdhirsh/650/head 2025-09-07T06:39:17.5554773Z * [new branch] gh/bdhirsh/650/orig -> origin/gh/bdhirsh/650/orig 2025-09-07T06:39:17.5554847Z * [new branch] gh/bdhirsh/663/base -> origin/gh/bdhirsh/663/base 2025-09-07T06:39:17.5554920Z * [new branch] gh/bdhirsh/663/head -> origin/gh/bdhirsh/663/head 2025-09-07T06:39:17.5554992Z * [new branch] gh/bdhirsh/663/orig -> origin/gh/bdhirsh/663/orig 2025-09-07T06:39:17.5555066Z * [new branch] gh/bdhirsh/665/base -> origin/gh/bdhirsh/665/base 2025-09-07T06:39:17.5555165Z * [new branch] gh/bdhirsh/665/head -> origin/gh/bdhirsh/665/head 2025-09-07T06:39:17.5556407Z * [new branch] gh/bdhirsh/665/orig -> origin/gh/bdhirsh/665/orig 2025-09-07T06:39:17.5556596Z * [new branch] gh/bdhirsh/666/base -> origin/gh/bdhirsh/666/base 2025-09-07T06:39:17.5556674Z * [new branch] gh/bdhirsh/666/head -> origin/gh/bdhirsh/666/head 2025-09-07T06:39:17.5556746Z * [new branch] gh/bdhirsh/666/orig -> origin/gh/bdhirsh/666/orig 2025-09-07T06:39:17.5556818Z * [new branch] gh/bdhirsh/667/base -> origin/gh/bdhirsh/667/base 2025-09-07T06:39:17.5556890Z * [new branch] gh/bdhirsh/667/head -> origin/gh/bdhirsh/667/head 2025-09-07T06:39:17.5556964Z * [new branch] gh/bdhirsh/667/orig -> origin/gh/bdhirsh/667/orig 2025-09-07T06:39:17.5557036Z * [new branch] gh/bdhirsh/668/base -> origin/gh/bdhirsh/668/base 2025-09-07T06:39:17.5557111Z * [new branch] gh/bdhirsh/668/head -> origin/gh/bdhirsh/668/head 2025-09-07T06:39:17.5557184Z * [new branch] gh/bdhirsh/668/orig -> origin/gh/bdhirsh/668/orig 2025-09-07T06:39:17.5557260Z * [new branch] gh/bdhirsh/669/base -> origin/gh/bdhirsh/669/base 2025-09-07T06:39:17.5557334Z * [new branch] gh/bdhirsh/669/head -> origin/gh/bdhirsh/669/head 2025-09-07T06:39:17.5557410Z * [new branch] gh/bdhirsh/669/orig -> origin/gh/bdhirsh/669/orig 2025-09-07T06:39:17.5557482Z * [new branch] gh/bdhirsh/670/base -> origin/gh/bdhirsh/670/base 2025-09-07T06:39:17.5557555Z * [new branch] gh/bdhirsh/670/head -> origin/gh/bdhirsh/670/head 2025-09-07T06:39:17.5557629Z * [new branch] gh/bdhirsh/670/orig -> origin/gh/bdhirsh/670/orig 2025-09-07T06:39:17.5557724Z * [new branch] gh/benjaminglass1/100/base -> origin/gh/benjaminglass1/100/base 2025-09-07T06:39:17.5557817Z * [new branch] gh/benjaminglass1/100/head -> origin/gh/benjaminglass1/100/head 2025-09-07T06:39:17.5557904Z * [new branch] gh/benjaminglass1/100/orig -> origin/gh/benjaminglass1/100/orig 2025-09-07T06:39:17.5558088Z * [new branch] gh/benjaminglass1/101/base -> origin/gh/benjaminglass1/101/base 2025-09-07T06:39:17.5558175Z * [new branch] gh/benjaminglass1/101/head -> origin/gh/benjaminglass1/101/head 2025-09-07T06:39:17.5558261Z * [new branch] gh/benjaminglass1/101/orig -> origin/gh/benjaminglass1/101/orig 2025-09-07T06:39:17.5558347Z * [new branch] gh/benjaminglass1/102/base -> origin/gh/benjaminglass1/102/base 2025-09-07T06:39:17.5558430Z * [new branch] gh/benjaminglass1/102/head -> origin/gh/benjaminglass1/102/head 2025-09-07T06:39:17.5559744Z * [new branch] gh/benjaminglass1/102/orig -> origin/gh/benjaminglass1/102/orig 2025-09-07T06:39:17.5559833Z * [new branch] gh/benjaminglass1/103/base -> origin/gh/benjaminglass1/103/base 2025-09-07T06:39:17.5559919Z * [new branch] gh/benjaminglass1/103/head -> origin/gh/benjaminglass1/103/head 2025-09-07T06:39:17.5560071Z * [new branch] gh/benjaminglass1/103/orig -> origin/gh/benjaminglass1/103/orig 2025-09-07T06:39:17.5560158Z * [new branch] gh/benjaminglass1/104/base -> origin/gh/benjaminglass1/104/base 2025-09-07T06:39:17.5560242Z * [new branch] gh/benjaminglass1/104/head -> origin/gh/benjaminglass1/104/head 2025-09-07T06:39:17.5560328Z * [new branch] gh/benjaminglass1/104/orig -> origin/gh/benjaminglass1/104/orig 2025-09-07T06:39:17.5560414Z * [new branch] gh/benjaminglass1/105/base -> origin/gh/benjaminglass1/105/base 2025-09-07T06:39:17.5560499Z * [new branch] gh/benjaminglass1/105/head -> origin/gh/benjaminglass1/105/head 2025-09-07T06:39:17.5560585Z * [new branch] gh/benjaminglass1/105/orig -> origin/gh/benjaminglass1/105/orig 2025-09-07T06:39:17.5560724Z * [new branch] gh/benjaminglass1/106/base -> origin/gh/benjaminglass1/106/base 2025-09-07T06:39:17.5560809Z * [new branch] gh/benjaminglass1/106/head -> origin/gh/benjaminglass1/106/head 2025-09-07T06:39:17.5560898Z * [new branch] gh/benjaminglass1/106/orig -> origin/gh/benjaminglass1/106/orig 2025-09-07T06:39:17.5560989Z * [new branch] gh/benjaminglass1/79/base -> origin/gh/benjaminglass1/79/base 2025-09-07T06:39:17.5561074Z * [new branch] gh/benjaminglass1/79/head -> origin/gh/benjaminglass1/79/head 2025-09-07T06:39:17.5561161Z * [new branch] gh/benjaminglass1/79/orig -> origin/gh/benjaminglass1/79/orig 2025-09-07T06:39:17.5561246Z * [new branch] gh/benjaminglass1/86/base -> origin/gh/benjaminglass1/86/base 2025-09-07T06:39:17.5561329Z * [new branch] gh/benjaminglass1/86/head -> origin/gh/benjaminglass1/86/head 2025-09-07T06:39:17.5561418Z * [new branch] gh/benjaminglass1/86/orig -> origin/gh/benjaminglass1/86/orig 2025-09-07T06:39:17.5561503Z * [new branch] gh/benjaminglass1/89/base -> origin/gh/benjaminglass1/89/base 2025-09-07T06:39:17.5561587Z * [new branch] gh/benjaminglass1/89/head -> origin/gh/benjaminglass1/89/head 2025-09-07T06:39:17.5561671Z * [new branch] gh/benjaminglass1/89/orig -> origin/gh/benjaminglass1/89/orig 2025-09-07T06:39:17.5562947Z * [new branch] gh/benjaminglass1/91/base -> origin/gh/benjaminglass1/91/base 2025-09-07T06:39:17.5563041Z * [new branch] gh/benjaminglass1/91/head -> origin/gh/benjaminglass1/91/head 2025-09-07T06:39:17.5563129Z * [new branch] gh/benjaminglass1/91/orig -> origin/gh/benjaminglass1/91/orig 2025-09-07T06:39:17.5563216Z * [new branch] gh/benjaminglass1/93/base -> origin/gh/benjaminglass1/93/base 2025-09-07T06:39:17.5563300Z * [new branch] gh/benjaminglass1/93/head -> origin/gh/benjaminglass1/93/head 2025-09-07T06:39:17.5563383Z * [new branch] gh/benjaminglass1/93/orig -> origin/gh/benjaminglass1/93/orig 2025-09-07T06:39:17.5563467Z * [new branch] gh/benjaminglass1/95/base -> origin/gh/benjaminglass1/95/base 2025-09-07T06:39:17.5563552Z * [new branch] gh/benjaminglass1/95/head -> origin/gh/benjaminglass1/95/head 2025-09-07T06:39:17.5563635Z * [new branch] gh/benjaminglass1/95/orig -> origin/gh/benjaminglass1/95/orig 2025-09-07T06:39:17.5563720Z * [new branch] gh/benjaminglass1/97/base -> origin/gh/benjaminglass1/97/base 2025-09-07T06:39:17.5563802Z * [new branch] gh/benjaminglass1/97/head -> origin/gh/benjaminglass1/97/head 2025-09-07T06:39:17.5563886Z * [new branch] gh/benjaminglass1/97/orig -> origin/gh/benjaminglass1/97/orig 2025-09-07T06:39:17.5563970Z * [new branch] gh/benjaminglass1/99/base -> origin/gh/benjaminglass1/99/base 2025-09-07T06:39:17.5564059Z * [new branch] gh/benjaminglass1/99/head -> origin/gh/benjaminglass1/99/head 2025-09-07T06:39:17.5564142Z * [new branch] gh/benjaminglass1/99/orig -> origin/gh/benjaminglass1/99/orig 2025-09-07T06:39:17.5564259Z * [new branch] gh/bobrenjc93/514/base -> origin/gh/bobrenjc93/514/base 2025-09-07T06:39:17.5564339Z * [new branch] gh/bobrenjc93/514/head -> origin/gh/bobrenjc93/514/head 2025-09-07T06:39:17.5564420Z * [new branch] gh/bobrenjc93/514/orig -> origin/gh/bobrenjc93/514/orig 2025-09-07T06:39:17.5564500Z * [new branch] gh/bobrenjc93/521/base -> origin/gh/bobrenjc93/521/base 2025-09-07T06:39:17.5564578Z * [new branch] gh/bobrenjc93/521/head -> origin/gh/bobrenjc93/521/head 2025-09-07T06:39:17.5564655Z * [new branch] gh/bobrenjc93/521/orig -> origin/gh/bobrenjc93/521/orig 2025-09-07T06:39:17.5564763Z * [new branch] gh/bobrenjc93/522/base -> origin/gh/bobrenjc93/522/base 2025-09-07T06:39:17.5564839Z * [new branch] gh/bobrenjc93/522/head -> origin/gh/bobrenjc93/522/head 2025-09-07T06:39:17.5564917Z * [new branch] gh/bobrenjc93/522/orig -> origin/gh/bobrenjc93/522/orig 2025-09-07T06:39:17.5564996Z * [new branch] gh/bobrenjc93/525/base -> origin/gh/bobrenjc93/525/base 2025-09-07T06:39:17.5565072Z * [new branch] gh/bobrenjc93/525/head -> origin/gh/bobrenjc93/525/head 2025-09-07T06:39:17.5565148Z * [new branch] gh/bobrenjc93/525/orig -> origin/gh/bobrenjc93/525/orig 2025-09-07T06:39:17.5565225Z * [new branch] gh/bobrenjc93/526/base -> origin/gh/bobrenjc93/526/base 2025-09-07T06:39:17.5565302Z * [new branch] gh/bobrenjc93/526/head -> origin/gh/bobrenjc93/526/head 2025-09-07T06:39:17.5565381Z * [new branch] gh/bobrenjc93/526/orig -> origin/gh/bobrenjc93/526/orig 2025-09-07T06:39:17.5565460Z * [new branch] gh/bobrenjc93/527/base -> origin/gh/bobrenjc93/527/base 2025-09-07T06:39:17.5565539Z * [new branch] gh/bobrenjc93/527/head -> origin/gh/bobrenjc93/527/head 2025-09-07T06:39:17.5565616Z * [new branch] gh/bobrenjc93/527/orig -> origin/gh/bobrenjc93/527/orig 2025-09-07T06:39:17.5565693Z * [new branch] gh/bobrenjc93/528/base -> origin/gh/bobrenjc93/528/base 2025-09-07T06:39:17.5565775Z * [new branch] gh/bobrenjc93/528/head -> origin/gh/bobrenjc93/528/head 2025-09-07T06:39:17.5565851Z * [new branch] gh/bobrenjc93/528/orig -> origin/gh/bobrenjc93/528/orig 2025-09-07T06:39:17.5565927Z * [new branch] gh/bobrenjc93/529/base -> origin/gh/bobrenjc93/529/base 2025-09-07T06:39:17.5567267Z * [new branch] gh/bobrenjc93/529/head -> origin/gh/bobrenjc93/529/head 2025-09-07T06:39:17.5567354Z * [new branch] gh/bobrenjc93/529/orig -> origin/gh/bobrenjc93/529/orig 2025-09-07T06:39:17.5567437Z * [new branch] gh/bobrenjc93/535/base -> origin/gh/bobrenjc93/535/base 2025-09-07T06:39:17.5567514Z * [new branch] gh/bobrenjc93/535/head -> origin/gh/bobrenjc93/535/head 2025-09-07T06:39:17.5567592Z * [new branch] gh/bobrenjc93/535/orig -> origin/gh/bobrenjc93/535/orig 2025-09-07T06:39:17.5567668Z * [new branch] gh/bobrenjc93/537/base -> origin/gh/bobrenjc93/537/base 2025-09-07T06:39:17.5567749Z * [new branch] gh/bobrenjc93/537/head -> origin/gh/bobrenjc93/537/head 2025-09-07T06:39:17.5567826Z * [new branch] gh/bobrenjc93/537/orig -> origin/gh/bobrenjc93/537/orig 2025-09-07T06:39:17.5567910Z * [new branch] gh/bobrenjc93/539/base -> origin/gh/bobrenjc93/539/base 2025-09-07T06:39:17.5567990Z * [new branch] gh/bobrenjc93/539/head -> origin/gh/bobrenjc93/539/head 2025-09-07T06:39:17.5568070Z * [new branch] gh/bobrenjc93/539/orig -> origin/gh/bobrenjc93/539/orig 2025-09-07T06:39:17.5568148Z * [new branch] gh/bobrenjc93/540/base -> origin/gh/bobrenjc93/540/base 2025-09-07T06:39:17.5568276Z * [new branch] gh/bobrenjc93/540/head -> origin/gh/bobrenjc93/540/head 2025-09-07T06:39:17.5568354Z * [new branch] gh/bobrenjc93/540/orig -> origin/gh/bobrenjc93/540/orig 2025-09-07T06:39:17.5568430Z * [new branch] gh/bobrenjc93/541/base -> origin/gh/bobrenjc93/541/base 2025-09-07T06:39:17.5568507Z * [new branch] gh/bobrenjc93/541/head -> origin/gh/bobrenjc93/541/head 2025-09-07T06:39:17.5568585Z * [new branch] gh/bobrenjc93/541/orig -> origin/gh/bobrenjc93/541/orig 2025-09-07T06:39:17.5568660Z * [new branch] gh/bobrenjc93/542/base -> origin/gh/bobrenjc93/542/base 2025-09-07T06:39:17.5568739Z * [new branch] gh/bobrenjc93/542/head -> origin/gh/bobrenjc93/542/head 2025-09-07T06:39:17.5568849Z * [new branch] gh/bobrenjc93/542/orig -> origin/gh/bobrenjc93/542/orig 2025-09-07T06:39:17.5568926Z * [new branch] gh/bobrenjc93/543/base -> origin/gh/bobrenjc93/543/base 2025-09-07T06:39:17.5569009Z * [new branch] gh/bobrenjc93/543/head -> origin/gh/bobrenjc93/543/head 2025-09-07T06:39:17.5569085Z * [new branch] gh/bobrenjc93/543/orig -> origin/gh/bobrenjc93/543/orig 2025-09-07T06:39:17.5570346Z * [new branch] gh/bobrenjc93/544/base -> origin/gh/bobrenjc93/544/base 2025-09-07T06:39:17.5570437Z * [new branch] gh/bobrenjc93/544/head -> origin/gh/bobrenjc93/544/head 2025-09-07T06:39:17.5570516Z * [new branch] gh/bobrenjc93/544/orig -> origin/gh/bobrenjc93/544/orig 2025-09-07T06:39:17.5570592Z * [new branch] gh/bobrenjc93/545/base -> origin/gh/bobrenjc93/545/base 2025-09-07T06:39:17.5570671Z * [new branch] gh/bobrenjc93/545/head -> origin/gh/bobrenjc93/545/head 2025-09-07T06:39:17.5570747Z * [new branch] gh/bobrenjc93/545/orig -> origin/gh/bobrenjc93/545/orig 2025-09-07T06:39:17.5570827Z * [new branch] gh/bobrenjc93/546/base -> origin/gh/bobrenjc93/546/base 2025-09-07T06:39:17.5570906Z * [new branch] gh/bobrenjc93/546/head -> origin/gh/bobrenjc93/546/head 2025-09-07T06:39:17.5570985Z * [new branch] gh/bobrenjc93/546/orig -> origin/gh/bobrenjc93/546/orig 2025-09-07T06:39:17.5571061Z * [new branch] gh/bobrenjc93/547/base -> origin/gh/bobrenjc93/547/base 2025-09-07T06:39:17.5571141Z * [new branch] gh/bobrenjc93/547/head -> origin/gh/bobrenjc93/547/head 2025-09-07T06:39:17.5571217Z * [new branch] gh/bobrenjc93/547/orig -> origin/gh/bobrenjc93/547/orig 2025-09-07T06:39:17.5571293Z * [new branch] gh/bobrenjc93/548/base -> origin/gh/bobrenjc93/548/base 2025-09-07T06:39:17.5571373Z * [new branch] gh/bobrenjc93/548/head -> origin/gh/bobrenjc93/548/head 2025-09-07T06:39:17.5571450Z * [new branch] gh/bobrenjc93/548/orig -> origin/gh/bobrenjc93/548/orig 2025-09-07T06:39:17.5571527Z * [new branch] gh/bobrenjc93/549/base -> origin/gh/bobrenjc93/549/base 2025-09-07T06:39:17.5571607Z * [new branch] gh/bobrenjc93/549/head -> origin/gh/bobrenjc93/549/head 2025-09-07T06:39:17.5571684Z * [new branch] gh/bobrenjc93/549/orig -> origin/gh/bobrenjc93/549/orig 2025-09-07T06:39:17.5571761Z * [new branch] gh/bobrenjc93/550/base -> origin/gh/bobrenjc93/550/base 2025-09-07T06:39:17.5571839Z * [new branch] gh/bobrenjc93/550/head -> origin/gh/bobrenjc93/550/head 2025-09-07T06:39:17.5571915Z * [new branch] gh/bobrenjc93/550/orig -> origin/gh/bobrenjc93/550/orig 2025-09-07T06:39:17.5571994Z * [new branch] gh/bobrenjc93/551/base -> origin/gh/bobrenjc93/551/base 2025-09-07T06:39:17.5572075Z * [new branch] gh/bobrenjc93/551/head -> origin/gh/bobrenjc93/551/head 2025-09-07T06:39:17.5572152Z * [new branch] gh/bobrenjc93/551/orig -> origin/gh/bobrenjc93/551/orig 2025-09-07T06:39:17.5572261Z * [new branch] gh/bobrenjc93/552/base -> origin/gh/bobrenjc93/552/base 2025-09-07T06:39:17.5572338Z * [new branch] gh/bobrenjc93/552/head -> origin/gh/bobrenjc93/552/head 2025-09-07T06:39:17.5572416Z * [new branch] gh/bobrenjc93/552/orig -> origin/gh/bobrenjc93/552/orig 2025-09-07T06:39:17.5572493Z * [new branch] gh/bobrenjc93/553/base -> origin/gh/bobrenjc93/553/base 2025-09-07T06:39:17.5572570Z * [new branch] gh/bobrenjc93/553/head -> origin/gh/bobrenjc93/553/head 2025-09-07T06:39:17.5572663Z * [new branch] gh/bobrenjc93/553/orig -> origin/gh/bobrenjc93/553/orig 2025-09-07T06:39:17.5572769Z * [new branch] gh/bobrenjc93/554/base -> origin/gh/bobrenjc93/554/base 2025-09-07T06:39:17.5574024Z * [new branch] gh/bobrenjc93/554/head -> origin/gh/bobrenjc93/554/head 2025-09-07T06:39:17.5574121Z * [new branch] gh/bobrenjc93/554/orig -> origin/gh/bobrenjc93/554/orig 2025-09-07T06:39:17.5574198Z * [new branch] gh/bobrenjc93/555/base -> origin/gh/bobrenjc93/555/base 2025-09-07T06:39:17.5574277Z * [new branch] gh/bobrenjc93/555/head -> origin/gh/bobrenjc93/555/head 2025-09-07T06:39:17.5574353Z * [new branch] gh/bobrenjc93/555/orig -> origin/gh/bobrenjc93/555/orig 2025-09-07T06:39:17.5574429Z * [new branch] gh/bobrenjc93/556/base -> origin/gh/bobrenjc93/556/base 2025-09-07T06:39:17.5574507Z * [new branch] gh/bobrenjc93/556/head -> origin/gh/bobrenjc93/556/head 2025-09-07T06:39:17.5574582Z * [new branch] gh/bobrenjc93/556/orig -> origin/gh/bobrenjc93/556/orig 2025-09-07T06:39:17.5574671Z * [new branch] gh/briancoutinho/2/base -> origin/gh/briancoutinho/2/base 2025-09-07T06:39:17.5574755Z * [new branch] gh/briancoutinho/2/head -> origin/gh/briancoutinho/2/head 2025-09-07T06:39:17.5574830Z * [new branch] gh/c00w/23/base -> origin/gh/c00w/23/base 2025-09-07T06:39:17.5574898Z * [new branch] gh/c00w/23/head -> origin/gh/c00w/23/head 2025-09-07T06:39:17.5574967Z * [new branch] gh/c00w/48/base -> origin/gh/c00w/48/base 2025-09-07T06:39:17.5575032Z * [new branch] gh/c00w/48/head -> origin/gh/c00w/48/head 2025-09-07T06:39:17.5575100Z * [new branch] gh/c00w/48/orig -> origin/gh/c00w/48/orig 2025-09-07T06:39:17.5575167Z * [new branch] gh/c00w/53/base -> origin/gh/c00w/53/base 2025-09-07T06:39:17.5575233Z * [new branch] gh/c00w/53/head -> origin/gh/c00w/53/head 2025-09-07T06:39:17.5575299Z * [new branch] gh/c00w/53/orig -> origin/gh/c00w/53/orig 2025-09-07T06:39:17.5575365Z * [new branch] gh/c00w/54/base -> origin/gh/c00w/54/base 2025-09-07T06:39:17.5575432Z * [new branch] gh/c00w/54/head -> origin/gh/c00w/54/head 2025-09-07T06:39:17.5575496Z * [new branch] gh/c00w/54/orig -> origin/gh/c00w/54/orig 2025-09-07T06:39:17.5575561Z * [new branch] gh/c00w/55/base -> origin/gh/c00w/55/base 2025-09-07T06:39:17.5575626Z * [new branch] gh/c00w/55/head -> origin/gh/c00w/55/head 2025-09-07T06:39:17.5575694Z * [new branch] gh/c00w/55/orig -> origin/gh/c00w/55/orig 2025-09-07T06:39:17.5576991Z * [new branch] gh/c00w/56/base -> origin/gh/c00w/56/base 2025-09-07T06:39:17.5577068Z * [new branch] gh/c00w/56/head -> origin/gh/c00w/56/head 2025-09-07T06:39:17.5577137Z * [new branch] gh/c00w/56/orig -> origin/gh/c00w/56/orig 2025-09-07T06:39:17.5577215Z * [new branch] gh/clee2000/1/base -> origin/gh/clee2000/1/base 2025-09-07T06:39:17.5577362Z * [new branch] gh/clee2000/1/head -> origin/gh/clee2000/1/head 2025-09-07T06:39:17.5577435Z * [new branch] gh/clee2000/1/orig -> origin/gh/clee2000/1/orig 2025-09-07T06:39:17.5577516Z * [new branch] gh/coconutruben/1/base -> origin/gh/coconutruben/1/base 2025-09-07T06:39:17.5577596Z * [new branch] gh/coconutruben/1/head -> origin/gh/coconutruben/1/head 2025-09-07T06:39:17.5577677Z * [new branch] gh/coconutruben/11/base -> origin/gh/coconutruben/11/base 2025-09-07T06:39:17.5577757Z * [new branch] gh/coconutruben/11/head -> origin/gh/coconutruben/11/head 2025-09-07T06:39:17.5577838Z * [new branch] gh/coconutruben/11/orig -> origin/gh/coconutruben/11/orig 2025-09-07T06:39:17.5577964Z * [new branch] gh/coconutruben/12/base -> origin/gh/coconutruben/12/base 2025-09-07T06:39:17.5578042Z * [new branch] gh/coconutruben/12/head -> origin/gh/coconutruben/12/head 2025-09-07T06:39:17.5578124Z * [new branch] gh/coconutruben/12/orig -> origin/gh/coconutruben/12/orig 2025-09-07T06:39:17.5578203Z * [new branch] gh/coconutruben/13/base -> origin/gh/coconutruben/13/base 2025-09-07T06:39:17.5578282Z * [new branch] gh/coconutruben/13/head -> origin/gh/coconutruben/13/head 2025-09-07T06:39:17.5578362Z * [new branch] gh/coconutruben/13/orig -> origin/gh/coconutruben/13/orig 2025-09-07T06:39:17.5578441Z * [new branch] gh/coconutruben/14/base -> origin/gh/coconutruben/14/base 2025-09-07T06:39:17.5578519Z * [new branch] gh/coconutruben/14/head -> origin/gh/coconutruben/14/head 2025-09-07T06:39:17.5578600Z * [new branch] gh/coconutruben/14/orig -> origin/gh/coconutruben/14/orig 2025-09-07T06:39:17.5578679Z * [new branch] gh/coconutruben/15/base -> origin/gh/coconutruben/15/base 2025-09-07T06:39:17.5578759Z * [new branch] gh/coconutruben/15/head -> origin/gh/coconutruben/15/head 2025-09-07T06:39:17.5578838Z * [new branch] gh/coconutruben/15/orig -> origin/gh/coconutruben/15/orig 2025-09-07T06:39:17.5580280Z * [new branch] gh/coconutruben/16/base -> origin/gh/coconutruben/16/base 2025-09-07T06:39:17.5580379Z * [new branch] gh/coconutruben/16/head -> origin/gh/coconutruben/16/head 2025-09-07T06:39:17.5580460Z * [new branch] gh/coconutruben/16/orig -> origin/gh/coconutruben/16/orig 2025-09-07T06:39:17.5580541Z * [new branch] gh/coconutruben/17/base -> origin/gh/coconutruben/17/base 2025-09-07T06:39:17.5580621Z * [new branch] gh/coconutruben/17/head -> origin/gh/coconutruben/17/head 2025-09-07T06:39:17.5580704Z * [new branch] gh/coconutruben/17/orig -> origin/gh/coconutruben/17/orig 2025-09-07T06:39:17.5580783Z * [new branch] gh/coconutruben/18/base -> origin/gh/coconutruben/18/base 2025-09-07T06:39:17.5580863Z * [new branch] gh/coconutruben/18/head -> origin/gh/coconutruben/18/head 2025-09-07T06:39:17.5580947Z * [new branch] gh/coconutruben/18/orig -> origin/gh/coconutruben/18/orig 2025-09-07T06:39:17.5581029Z * [new branch] gh/coconutruben/19/base -> origin/gh/coconutruben/19/base 2025-09-07T06:39:17.5581109Z * [new branch] gh/coconutruben/19/head -> origin/gh/coconutruben/19/head 2025-09-07T06:39:17.5581188Z * [new branch] gh/coconutruben/19/orig -> origin/gh/coconutruben/19/orig 2025-09-07T06:39:17.5581268Z * [new branch] gh/coconutruben/20/base -> origin/gh/coconutruben/20/base 2025-09-07T06:39:17.5581349Z * [new branch] gh/coconutruben/20/head -> origin/gh/coconutruben/20/head 2025-09-07T06:39:17.5581431Z * [new branch] gh/coconutruben/20/orig -> origin/gh/coconutruben/20/orig 2025-09-07T06:39:17.5581549Z * [new branch] gh/coconutruben/21/base -> origin/gh/coconutruben/21/base 2025-09-07T06:39:17.5581633Z * [new branch] gh/coconutruben/21/head -> origin/gh/coconutruben/21/head 2025-09-07T06:39:17.5581712Z * [new branch] gh/coconutruben/21/orig -> origin/gh/coconutruben/21/orig 2025-09-07T06:39:17.5581791Z * [new branch] gh/coconutruben/22/base -> origin/gh/coconutruben/22/base 2025-09-07T06:39:17.5581870Z * [new branch] gh/coconutruben/22/head -> origin/gh/coconutruben/22/head 2025-09-07T06:39:17.5581949Z * [new branch] gh/coconutruben/22/orig -> origin/gh/coconutruben/22/orig 2025-09-07T06:39:17.5582029Z * [new branch] gh/coconutruben/24/base -> origin/gh/coconutruben/24/base 2025-09-07T06:39:17.5582140Z * [new branch] gh/coconutruben/24/head -> origin/gh/coconutruben/24/head 2025-09-07T06:39:17.5582221Z * [new branch] gh/coconutruben/24/orig -> origin/gh/coconutruben/24/orig 2025-09-07T06:39:17.5582302Z * [new branch] gh/coconutruben/25/base -> origin/gh/coconutruben/25/base 2025-09-07T06:39:17.5582381Z * [new branch] gh/coconutruben/25/head -> origin/gh/coconutruben/25/head 2025-09-07T06:39:17.5582461Z * [new branch] gh/coconutruben/25/orig -> origin/gh/coconutruben/25/orig 2025-09-07T06:39:17.5582540Z * [new branch] gh/coconutruben/28/base -> origin/gh/coconutruben/28/base 2025-09-07T06:39:17.5582618Z * [new branch] gh/coconutruben/28/head -> origin/gh/coconutruben/28/head 2025-09-07T06:39:17.5583967Z * [new branch] gh/coconutruben/28/orig -> origin/gh/coconutruben/28/orig 2025-09-07T06:39:17.5584063Z * [new branch] gh/coconutruben/29/base -> origin/gh/coconutruben/29/base 2025-09-07T06:39:17.5584144Z * [new branch] gh/coconutruben/29/head -> origin/gh/coconutruben/29/head 2025-09-07T06:39:17.5584222Z * [new branch] gh/coconutruben/29/orig -> origin/gh/coconutruben/29/orig 2025-09-07T06:39:17.5584304Z * [new branch] gh/coconutruben/30/base -> origin/gh/coconutruben/30/base 2025-09-07T06:39:17.5584384Z * [new branch] gh/coconutruben/30/head -> origin/gh/coconutruben/30/head 2025-09-07T06:39:17.5584464Z * [new branch] gh/coconutruben/30/orig -> origin/gh/coconutruben/30/orig 2025-09-07T06:39:17.5584543Z * [new branch] gh/coconutruben/31/base -> origin/gh/coconutruben/31/base 2025-09-07T06:39:17.5584622Z * [new branch] gh/coconutruben/31/head -> origin/gh/coconutruben/31/head 2025-09-07T06:39:17.5584702Z * [new branch] gh/coconutruben/31/orig -> origin/gh/coconutruben/31/orig 2025-09-07T06:39:17.5584782Z * [new branch] gh/coconutruben/32/base -> origin/gh/coconutruben/32/base 2025-09-07T06:39:17.5584862Z * [new branch] gh/coconutruben/32/head -> origin/gh/coconutruben/32/head 2025-09-07T06:39:17.5584942Z * [new branch] gh/coconutruben/32/orig -> origin/gh/coconutruben/32/orig 2025-09-07T06:39:17.5585022Z * [new branch] gh/coconutruben/33/base -> origin/gh/coconutruben/33/base 2025-09-07T06:39:17.5585102Z * [new branch] gh/coconutruben/33/head -> origin/gh/coconutruben/33/head 2025-09-07T06:39:17.5585182Z * [new branch] gh/coconutruben/33/orig -> origin/gh/coconutruben/33/orig 2025-09-07T06:39:17.5585261Z * [new branch] gh/coconutruben/34/base -> origin/gh/coconutruben/34/base 2025-09-07T06:39:17.5585341Z * [new branch] gh/coconutruben/34/head -> origin/gh/coconutruben/34/head 2025-09-07T06:39:17.5585425Z * [new branch] gh/coconutruben/34/orig -> origin/gh/coconutruben/34/orig 2025-09-07T06:39:17.5585504Z * [new branch] gh/coconutruben/35/base -> origin/gh/coconutruben/35/base 2025-09-07T06:39:17.5585584Z * [new branch] gh/coconutruben/35/head -> origin/gh/coconutruben/35/head 2025-09-07T06:39:17.5585699Z * [new branch] gh/coconutruben/35/orig -> origin/gh/coconutruben/35/orig 2025-09-07T06:39:17.5585779Z * [new branch] gh/coconutruben/36/base -> origin/gh/coconutruben/36/base 2025-09-07T06:39:17.5587152Z * [new branch] gh/coconutruben/36/head -> origin/gh/coconutruben/36/head 2025-09-07T06:39:17.5587246Z * [new branch] gh/coconutruben/36/orig -> origin/gh/coconutruben/36/orig 2025-09-07T06:39:17.5587329Z * [new branch] gh/coconutruben/37/base -> origin/gh/coconutruben/37/base 2025-09-07T06:39:17.5587409Z * [new branch] gh/coconutruben/37/head -> origin/gh/coconutruben/37/head 2025-09-07T06:39:17.5587544Z * [new branch] gh/coconutruben/37/orig -> origin/gh/coconutruben/37/orig 2025-09-07T06:39:17.5587623Z * [new branch] gh/coconutruben/38/base -> origin/gh/coconutruben/38/base 2025-09-07T06:39:17.5587705Z * [new branch] gh/coconutruben/38/head -> origin/gh/coconutruben/38/head 2025-09-07T06:39:17.5587784Z * [new branch] gh/coconutruben/38/orig -> origin/gh/coconutruben/38/orig 2025-09-07T06:39:17.5587864Z * [new branch] gh/coconutruben/39/base -> origin/gh/coconutruben/39/base 2025-09-07T06:39:17.5587942Z * [new branch] gh/coconutruben/39/head -> origin/gh/coconutruben/39/head 2025-09-07T06:39:17.5588023Z * [new branch] gh/coconutruben/39/orig -> origin/gh/coconutruben/39/orig 2025-09-07T06:39:17.5588101Z * [new branch] gh/coconutruben/40/base -> origin/gh/coconutruben/40/base 2025-09-07T06:39:17.5588182Z * [new branch] gh/coconutruben/40/head -> origin/gh/coconutruben/40/head 2025-09-07T06:39:17.5588263Z * [new branch] gh/coconutruben/40/orig -> origin/gh/coconutruben/40/orig 2025-09-07T06:39:17.5588345Z * [new branch] gh/coconutruben/41/base -> origin/gh/coconutruben/41/base 2025-09-07T06:39:17.5588425Z * [new branch] gh/coconutruben/41/head -> origin/gh/coconutruben/41/head 2025-09-07T06:39:17.5588506Z * [new branch] gh/coconutruben/41/orig -> origin/gh/coconutruben/41/orig 2025-09-07T06:39:17.5588585Z * [new branch] gh/coconutruben/42/base -> origin/gh/coconutruben/42/base 2025-09-07T06:39:17.5588667Z * [new branch] gh/coconutruben/42/head -> origin/gh/coconutruben/42/head 2025-09-07T06:39:17.5588750Z * [new branch] gh/coconutruben/42/orig -> origin/gh/coconutruben/42/orig 2025-09-07T06:39:17.5588830Z * [new branch] gh/coconutruben/43/base -> origin/gh/coconutruben/43/base 2025-09-07T06:39:17.5588912Z * [new branch] gh/coconutruben/43/head -> origin/gh/coconutruben/43/head 2025-09-07T06:39:17.5588993Z * [new branch] gh/coconutruben/43/orig -> origin/gh/coconutruben/43/orig 2025-09-07T06:39:17.5589075Z * [new branch] gh/coconutruben/44/base -> origin/gh/coconutruben/44/base 2025-09-07T06:39:17.5589155Z * [new branch] gh/coconutruben/44/head -> origin/gh/coconutruben/44/head 2025-09-07T06:39:17.5589235Z * [new branch] gh/coconutruben/44/orig -> origin/gh/coconutruben/44/orig 2025-09-07T06:39:17.5589313Z * [new branch] gh/coconutruben/45/base -> origin/gh/coconutruben/45/base 2025-09-07T06:39:17.5589392Z * [new branch] gh/coconutruben/45/head -> origin/gh/coconutruben/45/head 2025-09-07T06:39:17.5589473Z * [new branch] gh/coconutruben/45/orig -> origin/gh/coconutruben/45/orig 2025-09-07T06:39:17.5589554Z * [new branch] gh/coconutruben/46/base -> origin/gh/coconutruben/46/base 2025-09-07T06:39:17.5589632Z * [new branch] gh/coconutruben/46/head -> origin/gh/coconutruben/46/head 2025-09-07T06:39:17.5589713Z * [new branch] gh/coconutruben/46/orig -> origin/gh/coconutruben/46/orig 2025-09-07T06:39:17.5589843Z * [new branch] gh/coconutruben/47/base -> origin/gh/coconutruben/47/base 2025-09-07T06:39:17.5589923Z * [new branch] gh/coconutruben/47/head -> origin/gh/coconutruben/47/head 2025-09-07T06:39:17.5590002Z * [new branch] gh/coconutruben/47/orig -> origin/gh/coconutruben/47/orig 2025-09-07T06:39:17.5590081Z * [new branch] gh/coconutruben/48/base -> origin/gh/coconutruben/48/base 2025-09-07T06:39:17.5591389Z * [new branch] gh/coconutruben/48/head -> origin/gh/coconutruben/48/head 2025-09-07T06:39:17.5591475Z * [new branch] gh/coconutruben/48/orig -> origin/gh/coconutruben/48/orig 2025-09-07T06:39:17.5591633Z * [new branch] gh/coconutruben/49/base -> origin/gh/coconutruben/49/base 2025-09-07T06:39:17.5591714Z * [new branch] gh/coconutruben/49/head -> origin/gh/coconutruben/49/head 2025-09-07T06:39:17.5591794Z * [new branch] gh/coconutruben/49/orig -> origin/gh/coconutruben/49/orig 2025-09-07T06:39:17.5591873Z * [new branch] gh/coconutruben/50/base -> origin/gh/coconutruben/50/base 2025-09-07T06:39:17.5591954Z * [new branch] gh/coconutruben/50/head -> origin/gh/coconutruben/50/head 2025-09-07T06:39:17.5592032Z * [new branch] gh/coconutruben/50/orig -> origin/gh/coconutruben/50/orig 2025-09-07T06:39:17.5592114Z * [new branch] gh/coconutruben/51/base -> origin/gh/coconutruben/51/base 2025-09-07T06:39:17.5592197Z * [new branch] gh/coconutruben/51/head -> origin/gh/coconutruben/51/head 2025-09-07T06:39:17.5592275Z * [new branch] gh/coconutruben/51/orig -> origin/gh/coconutruben/51/orig 2025-09-07T06:39:17.5592359Z * [new branch] gh/coconutruben/52/base -> origin/gh/coconutruben/52/base 2025-09-07T06:39:17.5592441Z * [new branch] gh/coconutruben/52/head -> origin/gh/coconutruben/52/head 2025-09-07T06:39:17.5592524Z * [new branch] gh/coconutruben/52/orig -> origin/gh/coconutruben/52/orig 2025-09-07T06:39:17.5592604Z * [new branch] gh/coconutruben/53/base -> origin/gh/coconutruben/53/base 2025-09-07T06:39:17.5592683Z * [new branch] gh/coconutruben/53/head -> origin/gh/coconutruben/53/head 2025-09-07T06:39:17.5592763Z * [new branch] gh/coconutruben/53/orig -> origin/gh/coconutruben/53/orig 2025-09-07T06:39:17.5592842Z * [new branch] gh/coconutruben/54/base -> origin/gh/coconutruben/54/base 2025-09-07T06:39:17.5592922Z * [new branch] gh/coconutruben/54/head -> origin/gh/coconutruben/54/head 2025-09-07T06:39:17.5593003Z * [new branch] gh/coconutruben/54/orig -> origin/gh/coconutruben/54/orig 2025-09-07T06:39:17.5593082Z * [new branch] gh/coconutruben/55/base -> origin/gh/coconutruben/55/base 2025-09-07T06:39:17.5593162Z * [new branch] gh/coconutruben/55/head -> origin/gh/coconutruben/55/head 2025-09-07T06:39:17.5593244Z * [new branch] gh/coconutruben/55/orig -> origin/gh/coconutruben/55/orig 2025-09-07T06:39:17.5594500Z * [new branch] gh/coconutruben/56/base -> origin/gh/coconutruben/56/base 2025-09-07T06:39:17.5594593Z * [new branch] gh/coconutruben/56/head -> origin/gh/coconutruben/56/head 2025-09-07T06:39:17.5594676Z * [new branch] gh/coconutruben/56/orig -> origin/gh/coconutruben/56/orig 2025-09-07T06:39:17.5594755Z * [new branch] gh/coconutruben/57/base -> origin/gh/coconutruben/57/base 2025-09-07T06:39:17.5594837Z * [new branch] gh/coconutruben/57/head -> origin/gh/coconutruben/57/head 2025-09-07T06:39:17.5594920Z * [new branch] gh/coconutruben/57/orig -> origin/gh/coconutruben/57/orig 2025-09-07T06:39:17.5594999Z * [new branch] gh/coconutruben/58/base -> origin/gh/coconutruben/58/base 2025-09-07T06:39:17.5595122Z * [new branch] gh/coconutruben/58/head -> origin/gh/coconutruben/58/head 2025-09-07T06:39:17.5595202Z * [new branch] gh/coconutruben/58/orig -> origin/gh/coconutruben/58/orig 2025-09-07T06:39:17.5595284Z * [new branch] gh/coconutruben/59/base -> origin/gh/coconutruben/59/base 2025-09-07T06:39:17.5595364Z * [new branch] gh/coconutruben/59/head -> origin/gh/coconutruben/59/head 2025-09-07T06:39:17.5595444Z * [new branch] gh/coconutruben/59/orig -> origin/gh/coconutruben/59/orig 2025-09-07T06:39:17.5595527Z * [new branch] gh/coconutruben/60/base -> origin/gh/coconutruben/60/base 2025-09-07T06:39:17.5595632Z * [new branch] gh/coconutruben/60/head -> origin/gh/coconutruben/60/head 2025-09-07T06:39:17.5595712Z * [new branch] gh/coconutruben/60/orig -> origin/gh/coconutruben/60/orig 2025-09-07T06:39:17.5595792Z * [new branch] gh/coconutruben/61/base -> origin/gh/coconutruben/61/base 2025-09-07T06:39:17.5595871Z * [new branch] gh/coconutruben/61/head -> origin/gh/coconutruben/61/head 2025-09-07T06:39:17.5595954Z * [new branch] gh/coconutruben/61/orig -> origin/gh/coconutruben/61/orig 2025-09-07T06:39:17.5596032Z * [new branch] gh/coconutruben/62/base -> origin/gh/coconutruben/62/base 2025-09-07T06:39:17.5596111Z * [new branch] gh/coconutruben/62/head -> origin/gh/coconutruben/62/head 2025-09-07T06:39:17.5596194Z * [new branch] gh/coconutruben/62/orig -> origin/gh/coconutruben/62/orig 2025-09-07T06:39:17.5596273Z * [new branch] gh/coconutruben/63/base -> origin/gh/coconutruben/63/base 2025-09-07T06:39:17.5596354Z * [new branch] gh/coconutruben/63/head -> origin/gh/coconutruben/63/head 2025-09-07T06:39:17.5596437Z * [new branch] gh/coconutruben/63/orig -> origin/gh/coconutruben/63/orig 2025-09-07T06:39:17.5596615Z * [new branch] gh/coconutruben/64/base -> origin/gh/coconutruben/64/base 2025-09-07T06:39:17.5596695Z * [new branch] gh/coconutruben/64/head -> origin/gh/coconutruben/64/head 2025-09-07T06:39:17.5596773Z * [new branch] gh/coconutruben/64/orig -> origin/gh/coconutruben/64/orig 2025-09-07T06:39:17.5596857Z * [new branch] gh/coconutruben/65/base -> origin/gh/coconutruben/65/base 2025-09-07T06:39:17.5596936Z * [new branch] gh/coconutruben/65/head -> origin/gh/coconutruben/65/head 2025-09-07T06:39:17.5598355Z * [new branch] gh/coconutruben/65/orig -> origin/gh/coconutruben/65/orig 2025-09-07T06:39:17.5598438Z * [new branch] gh/coconutruben/66/base -> origin/gh/coconutruben/66/base 2025-09-07T06:39:17.5598522Z * [new branch] gh/coconutruben/66/head -> origin/gh/coconutruben/66/head 2025-09-07T06:39:17.5598607Z * [new branch] gh/coconutruben/66/orig -> origin/gh/coconutruben/66/orig 2025-09-07T06:39:17.5598702Z * [new branch] gh/codingwithsurya/12/base -> origin/gh/codingwithsurya/12/base 2025-09-07T06:39:17.5598792Z * [new branch] gh/codingwithsurya/12/head -> origin/gh/codingwithsurya/12/head 2025-09-07T06:39:17.5598880Z * [new branch] gh/codingwithsurya/12/orig -> origin/gh/codingwithsurya/12/orig 2025-09-07T06:39:17.5598966Z * [new branch] gh/codingwithsurya/14/base -> origin/gh/codingwithsurya/14/base 2025-09-07T06:39:17.5599052Z * [new branch] gh/codingwithsurya/14/head -> origin/gh/codingwithsurya/14/head 2025-09-07T06:39:17.5599140Z * [new branch] gh/codingwithsurya/14/orig -> origin/gh/codingwithsurya/14/orig 2025-09-07T06:39:17.5599231Z * [new branch] gh/codingwithsurya/15/base -> origin/gh/codingwithsurya/15/base 2025-09-07T06:39:17.5599318Z * [new branch] gh/codingwithsurya/15/head -> origin/gh/codingwithsurya/15/head 2025-09-07T06:39:17.5599460Z * [new branch] gh/codingwithsurya/15/orig -> origin/gh/codingwithsurya/15/orig 2025-09-07T06:39:17.5599549Z * [new branch] gh/codingwithsurya/16/base -> origin/gh/codingwithsurya/16/base 2025-09-07T06:39:17.5599638Z * [new branch] gh/codingwithsurya/16/head -> origin/gh/codingwithsurya/16/head 2025-09-07T06:39:17.5599726Z * [new branch] gh/codingwithsurya/16/orig -> origin/gh/codingwithsurya/16/orig 2025-09-07T06:39:17.5599811Z * [new branch] gh/codingwithsurya/17/base -> origin/gh/codingwithsurya/17/base 2025-09-07T06:39:17.5599900Z * [new branch] gh/codingwithsurya/17/head -> origin/gh/codingwithsurya/17/head 2025-09-07T06:39:17.5600032Z * [new branch] gh/codingwithsurya/17/orig -> origin/gh/codingwithsurya/17/orig 2025-09-07T06:39:17.5600118Z * [new branch] gh/codingwithsurya/18/base -> origin/gh/codingwithsurya/18/base 2025-09-07T06:39:17.5600205Z * [new branch] gh/codingwithsurya/18/head -> origin/gh/codingwithsurya/18/head 2025-09-07T06:39:17.5600292Z * [new branch] gh/codingwithsurya/18/orig -> origin/gh/codingwithsurya/18/orig 2025-09-07T06:39:17.5601585Z * [new branch] gh/codingwithsurya/19/base -> origin/gh/codingwithsurya/19/base 2025-09-07T06:39:17.5601684Z * [new branch] gh/codingwithsurya/19/head -> origin/gh/codingwithsurya/19/head 2025-09-07T06:39:17.5601773Z * [new branch] gh/codingwithsurya/19/orig -> origin/gh/codingwithsurya/19/orig 2025-09-07T06:39:17.5601862Z * [new branch] gh/codingwithsurya/20/base -> origin/gh/codingwithsurya/20/base 2025-09-07T06:39:17.5601951Z * [new branch] gh/codingwithsurya/20/head -> origin/gh/codingwithsurya/20/head 2025-09-07T06:39:17.5602040Z * [new branch] gh/codingwithsurya/20/orig -> origin/gh/codingwithsurya/20/orig 2025-09-07T06:39:17.5602127Z * [new branch] gh/codingwithsurya/21/base -> origin/gh/codingwithsurya/21/base 2025-09-07T06:39:17.5602213Z * [new branch] gh/codingwithsurya/21/head -> origin/gh/codingwithsurya/21/head 2025-09-07T06:39:17.5602299Z * [new branch] gh/codingwithsurya/21/orig -> origin/gh/codingwithsurya/21/orig 2025-09-07T06:39:17.5602383Z * [new branch] gh/colinchan15/1/base -> origin/gh/colinchan15/1/base 2025-09-07T06:39:17.5602462Z * [new branch] gh/colinchan15/1/head -> origin/gh/colinchan15/1/head 2025-09-07T06:39:17.5602541Z * [new branch] gh/colinchan15/2/base -> origin/gh/colinchan15/2/base 2025-09-07T06:39:17.5602620Z * [new branch] gh/colinchan15/2/head -> origin/gh/colinchan15/2/head 2025-09-07T06:39:17.5602697Z * [new branch] gh/colinchan15/3/base -> origin/gh/colinchan15/3/base 2025-09-07T06:39:17.5602773Z * [new branch] gh/colinchan15/3/head -> origin/gh/colinchan15/3/head 2025-09-07T06:39:17.5602851Z * [new branch] gh/colinchan15/6/base -> origin/gh/colinchan15/6/base 2025-09-07T06:39:17.5602927Z * [new branch] gh/colinchan15/6/head -> origin/gh/colinchan15/6/head 2025-09-07T06:39:17.5603015Z * [new branch] gh/davidberard98/382/base -> origin/gh/davidberard98/382/base 2025-09-07T06:39:17.5603101Z * [new branch] gh/davidberard98/382/head -> origin/gh/davidberard98/382/head 2025-09-07T06:39:17.5603185Z * [new branch] gh/davidberard98/382/orig -> origin/gh/davidberard98/382/orig 2025-09-07T06:39:17.5603267Z * [new branch] gh/davidberard98/386/base -> origin/gh/davidberard98/386/base 2025-09-07T06:39:17.5603357Z * [new branch] gh/davidberard98/386/head -> origin/gh/davidberard98/386/head 2025-09-07T06:39:17.5603440Z * [new branch] gh/davidberard98/386/orig -> origin/gh/davidberard98/386/orig 2025-09-07T06:39:17.5603561Z * [new branch] gh/davidberard98/391/base -> origin/gh/davidberard98/391/base 2025-09-07T06:39:17.5603644Z * [new branch] gh/davidberard98/391/head -> origin/gh/davidberard98/391/head 2025-09-07T06:39:17.5603727Z * [new branch] gh/davidberard98/391/orig -> origin/gh/davidberard98/391/orig 2025-09-07T06:39:17.5603809Z * [new branch] gh/davidberard98/392/base -> origin/gh/davidberard98/392/base 2025-09-07T06:39:17.5603893Z * [new branch] gh/davidberard98/392/head -> origin/gh/davidberard98/392/head 2025-09-07T06:39:17.5603975Z * [new branch] gh/davidberard98/392/orig -> origin/gh/davidberard98/392/orig 2025-09-07T06:39:17.5604058Z * [new branch] gh/davidberard98/394/base -> origin/gh/davidberard98/394/base 2025-09-07T06:39:17.5604166Z * [new branch] gh/davidberard98/394/head -> origin/gh/davidberard98/394/head 2025-09-07T06:39:17.5604248Z * [new branch] gh/davidberard98/394/orig -> origin/gh/davidberard98/394/orig 2025-09-07T06:39:17.5604332Z * [new branch] gh/davidberard98/396/base -> origin/gh/davidberard98/396/base 2025-09-07T06:39:17.5605627Z * [new branch] gh/davidberard98/396/head -> origin/gh/davidberard98/396/head 2025-09-07T06:39:17.5605728Z * [new branch] gh/davidberard98/396/orig -> origin/gh/davidberard98/396/orig 2025-09-07T06:39:17.5605811Z * [new branch] gh/davidberard98/397/base -> origin/gh/davidberard98/397/base 2025-09-07T06:39:17.5605896Z * [new branch] gh/davidberard98/397/head -> origin/gh/davidberard98/397/head 2025-09-07T06:39:17.5605978Z * [new branch] gh/davidberard98/397/orig -> origin/gh/davidberard98/397/orig 2025-09-07T06:39:17.5606062Z * [new branch] gh/davidberard98/398/base -> origin/gh/davidberard98/398/base 2025-09-07T06:39:17.5606145Z * [new branch] gh/davidberard98/398/head -> origin/gh/davidberard98/398/head 2025-09-07T06:39:17.5606228Z * [new branch] gh/davidberard98/398/orig -> origin/gh/davidberard98/398/orig 2025-09-07T06:39:17.5606310Z * [new branch] gh/davidberard98/399/base -> origin/gh/davidberard98/399/base 2025-09-07T06:39:17.5606393Z * [new branch] gh/davidberard98/399/head -> origin/gh/davidberard98/399/head 2025-09-07T06:39:17.5606475Z * [new branch] gh/davidberard98/399/orig -> origin/gh/davidberard98/399/orig 2025-09-07T06:39:17.5606624Z * [new branch] gh/davidberard98/400/base -> origin/gh/davidberard98/400/base 2025-09-07T06:39:17.5606710Z * [new branch] gh/davidberard98/400/head -> origin/gh/davidberard98/400/head 2025-09-07T06:39:17.5606794Z * [new branch] gh/davidberard98/400/orig -> origin/gh/davidberard98/400/orig 2025-09-07T06:39:17.5606880Z * [new branch] gh/davidberard98/401/base -> origin/gh/davidberard98/401/base 2025-09-07T06:39:17.5606963Z * [new branch] gh/davidberard98/401/head -> origin/gh/davidberard98/401/head 2025-09-07T06:39:17.5607046Z * [new branch] gh/davidberard98/401/orig -> origin/gh/davidberard98/401/orig 2025-09-07T06:39:17.5607131Z * [new branch] gh/davidberard98/402/base -> origin/gh/davidberard98/402/base 2025-09-07T06:39:17.5607213Z * [new branch] gh/davidberard98/402/head -> origin/gh/davidberard98/402/head 2025-09-07T06:39:17.5607296Z * [new branch] gh/davidberard98/402/orig -> origin/gh/davidberard98/402/orig 2025-09-07T06:39:17.5607381Z * [new branch] gh/davidberard98/403/base -> origin/gh/davidberard98/403/base 2025-09-07T06:39:17.5607463Z * [new branch] gh/davidberard98/403/head -> origin/gh/davidberard98/403/head 2025-09-07T06:39:17.5607547Z * [new branch] gh/davidberard98/403/orig -> origin/gh/davidberard98/403/orig 2025-09-07T06:39:17.5608766Z * [new branch] gh/davidberard98/404/base -> origin/gh/davidberard98/404/base 2025-09-07T06:39:17.5608911Z * [new branch] gh/davidberard98/404/head -> origin/gh/davidberard98/404/head 2025-09-07T06:39:17.5608995Z * [new branch] gh/davidberard98/404/orig -> origin/gh/davidberard98/404/orig 2025-09-07T06:39:17.5609078Z * [new branch] gh/davidberard98/405/base -> origin/gh/davidberard98/405/base 2025-09-07T06:39:17.5609162Z * [new branch] gh/davidberard98/405/head -> origin/gh/davidberard98/405/head 2025-09-07T06:39:17.5609245Z * [new branch] gh/davidberard98/405/orig -> origin/gh/davidberard98/405/orig 2025-09-07T06:39:17.5609327Z * [new branch] gh/davidberard98/406/base -> origin/gh/davidberard98/406/base 2025-09-07T06:39:17.5609452Z * [new branch] gh/davidberard98/406/head -> origin/gh/davidberard98/406/head 2025-09-07T06:39:17.5609535Z * [new branch] gh/davidberard98/406/orig -> origin/gh/davidberard98/406/orig 2025-09-07T06:39:17.5609620Z * [new branch] gh/davidberard98/407/base -> origin/gh/davidberard98/407/base 2025-09-07T06:39:17.5609703Z * [new branch] gh/davidberard98/407/head -> origin/gh/davidberard98/407/head 2025-09-07T06:39:17.5609787Z * [new branch] gh/davidberard98/407/orig -> origin/gh/davidberard98/407/orig 2025-09-07T06:39:17.5609870Z * [new branch] gh/davidberard98/408/base -> origin/gh/davidberard98/408/base 2025-09-07T06:39:17.5609953Z * [new branch] gh/davidberard98/408/head -> origin/gh/davidberard98/408/head 2025-09-07T06:39:17.5610036Z * [new branch] gh/davidberard98/408/orig -> origin/gh/davidberard98/408/orig 2025-09-07T06:39:17.5610119Z * [new branch] gh/davidberard98/409/base -> origin/gh/davidberard98/409/base 2025-09-07T06:39:17.5610204Z * [new branch] gh/davidberard98/409/head -> origin/gh/davidberard98/409/head 2025-09-07T06:39:17.5610287Z * [new branch] gh/davidberard98/409/orig -> origin/gh/davidberard98/409/orig 2025-09-07T06:39:17.5610368Z * [new branch] gh/desertfire/594/base -> origin/gh/desertfire/594/base 2025-09-07T06:39:17.5610449Z * [new branch] gh/desertfire/594/head -> origin/gh/desertfire/594/head 2025-09-07T06:39:17.5610526Z * [new branch] gh/desertfire/594/orig -> origin/gh/desertfire/594/orig 2025-09-07T06:39:17.5610604Z * [new branch] gh/desertfire/595/base -> origin/gh/desertfire/595/base 2025-09-07T06:39:17.5610683Z * [new branch] gh/desertfire/595/head -> origin/gh/desertfire/595/head 2025-09-07T06:39:17.5610759Z * [new branch] gh/desertfire/595/orig -> origin/gh/desertfire/595/orig 2025-09-07T06:39:17.5611965Z * [new branch] gh/desertfire/597/base -> origin/gh/desertfire/597/base 2025-09-07T06:39:17.5612059Z * [new branch] gh/desertfire/597/head -> origin/gh/desertfire/597/head 2025-09-07T06:39:17.5612144Z * [new branch] gh/desertfire/597/orig -> origin/gh/desertfire/597/orig 2025-09-07T06:39:17.5612220Z * [new branch] gh/dharakk/1/base -> origin/gh/dharakk/1/base 2025-09-07T06:39:17.5612297Z * [new branch] gh/dharakk/1/head -> origin/gh/dharakk/1/head 2025-09-07T06:39:17.5612375Z * [new branch] gh/drisspg/149/base -> origin/gh/drisspg/149/base 2025-09-07T06:39:17.5612449Z * [new branch] gh/drisspg/149/head -> origin/gh/drisspg/149/head 2025-09-07T06:39:17.5612527Z * [new branch] gh/drisspg/149/orig -> origin/gh/drisspg/149/orig 2025-09-07T06:39:17.5612604Z * [new branch] gh/drisspg/159/base -> origin/gh/drisspg/159/base 2025-09-07T06:39:17.5612680Z * [new branch] gh/drisspg/159/head -> origin/gh/drisspg/159/head 2025-09-07T06:39:17.5612754Z * [new branch] gh/drisspg/159/orig -> origin/gh/drisspg/159/orig 2025-09-07T06:39:17.5612866Z * [new branch] gh/drisspg/166/base -> origin/gh/drisspg/166/base 2025-09-07T06:39:17.5612940Z * [new branch] gh/drisspg/166/head -> origin/gh/drisspg/166/head 2025-09-07T06:39:17.5613014Z * [new branch] gh/drisspg/166/orig -> origin/gh/drisspg/166/orig 2025-09-07T06:39:17.5613088Z * [new branch] gh/drisspg/170/base -> origin/gh/drisspg/170/base 2025-09-07T06:39:17.5613164Z * [new branch] gh/drisspg/170/head -> origin/gh/drisspg/170/head 2025-09-07T06:39:17.5613238Z * [new branch] gh/drisspg/170/orig -> origin/gh/drisspg/170/orig 2025-09-07T06:39:17.5613311Z * [new branch] gh/drisspg/173/base -> origin/gh/drisspg/173/base 2025-09-07T06:39:17.5613409Z * [new branch] gh/drisspg/173/head -> origin/gh/drisspg/173/head 2025-09-07T06:39:17.5613483Z * [new branch] gh/drisspg/173/orig -> origin/gh/drisspg/173/orig 2025-09-07T06:39:17.5613558Z * [new branch] gh/drisspg/177/base -> origin/gh/drisspg/177/base 2025-09-07T06:39:17.5613632Z * [new branch] gh/drisspg/177/head -> origin/gh/drisspg/177/head 2025-09-07T06:39:17.5613706Z * [new branch] gh/drisspg/177/orig -> origin/gh/drisspg/177/orig 2025-09-07T06:39:17.5613780Z * [new branch] gh/drisspg/178/base -> origin/gh/drisspg/178/base 2025-09-07T06:39:17.5613853Z * [new branch] gh/drisspg/178/head -> origin/gh/drisspg/178/head 2025-09-07T06:39:17.5613925Z * [new branch] gh/drisspg/178/orig -> origin/gh/drisspg/178/orig 2025-09-07T06:39:17.5613998Z * [new branch] gh/drisspg/180/base -> origin/gh/drisspg/180/base 2025-09-07T06:39:17.5614072Z * [new branch] gh/drisspg/180/head -> origin/gh/drisspg/180/head 2025-09-07T06:39:17.5614145Z * [new branch] gh/drisspg/180/orig -> origin/gh/drisspg/180/orig 2025-09-07T06:39:17.5614219Z * [new branch] gh/drisspg/181/base -> origin/gh/drisspg/181/base 2025-09-07T06:39:17.5614292Z * [new branch] gh/drisspg/181/head -> origin/gh/drisspg/181/head 2025-09-07T06:39:17.5614364Z * [new branch] gh/drisspg/181/orig -> origin/gh/drisspg/181/orig 2025-09-07T06:39:17.5614437Z * [new branch] gh/drisspg/182/base -> origin/gh/drisspg/182/base 2025-09-07T06:39:17.5614509Z * [new branch] gh/drisspg/182/head -> origin/gh/drisspg/182/head 2025-09-07T06:39:17.5614583Z * [new branch] gh/drisspg/183/base -> origin/gh/drisspg/183/base 2025-09-07T06:39:17.5614658Z * [new branch] gh/drisspg/183/head -> origin/gh/drisspg/183/head 2025-09-07T06:39:17.5615904Z * [new branch] gh/drisspg/184/base -> origin/gh/drisspg/184/base 2025-09-07T06:39:17.5615994Z * [new branch] gh/drisspg/184/head -> origin/gh/drisspg/184/head 2025-09-07T06:39:17.5616071Z * [new branch] gh/drisspg/185/base -> origin/gh/drisspg/185/base 2025-09-07T06:39:17.5616147Z * [new branch] gh/drisspg/185/head -> origin/gh/drisspg/185/head 2025-09-07T06:39:17.5616220Z * [new branch] gh/drisspg/186/base -> origin/gh/drisspg/186/base 2025-09-07T06:39:17.5616294Z * [new branch] gh/drisspg/186/head -> origin/gh/drisspg/186/head 2025-09-07T06:39:17.5616367Z * [new branch] gh/drisspg/186/orig -> origin/gh/drisspg/186/orig 2025-09-07T06:39:17.5616439Z * [new branch] gh/drisspg/187/base -> origin/gh/drisspg/187/base 2025-09-07T06:39:17.5616583Z * [new branch] gh/drisspg/187/head -> origin/gh/drisspg/187/head 2025-09-07T06:39:17.5616658Z * [new branch] gh/drisspg/187/orig -> origin/gh/drisspg/187/orig 2025-09-07T06:39:17.5616731Z * [new branch] gh/drisspg/188/base -> origin/gh/drisspg/188/base 2025-09-07T06:39:17.5616848Z * [new branch] gh/drisspg/188/head -> origin/gh/drisspg/188/head 2025-09-07T06:39:17.5616921Z * [new branch] gh/drisspg/188/orig -> origin/gh/drisspg/188/orig 2025-09-07T06:39:17.5616992Z * [new branch] gh/drisspg/189/base -> origin/gh/drisspg/189/base 2025-09-07T06:39:17.5617069Z * [new branch] gh/drisspg/189/head -> origin/gh/drisspg/189/head 2025-09-07T06:39:17.5617145Z * [new branch] gh/drisspg/189/orig -> origin/gh/drisspg/189/orig 2025-09-07T06:39:17.5617217Z * [new branch] gh/drisspg/190/base -> origin/gh/drisspg/190/base 2025-09-07T06:39:17.5617335Z * [new branch] gh/drisspg/190/head -> origin/gh/drisspg/190/head 2025-09-07T06:39:17.5617408Z * [new branch] gh/drisspg/190/orig -> origin/gh/drisspg/190/orig 2025-09-07T06:39:17.5617482Z * [new branch] gh/drisspg/191/base -> origin/gh/drisspg/191/base 2025-09-07T06:39:17.5617556Z * [new branch] gh/drisspg/191/head -> origin/gh/drisspg/191/head 2025-09-07T06:39:17.5617632Z * [new branch] gh/drisspg/191/orig -> origin/gh/drisspg/191/orig 2025-09-07T06:39:17.5617705Z * [new branch] gh/drisspg/192/base -> origin/gh/drisspg/192/base 2025-09-07T06:39:17.5618969Z * [new branch] gh/drisspg/192/head -> origin/gh/drisspg/192/head 2025-09-07T06:39:17.5619053Z * [new branch] gh/drisspg/192/orig -> origin/gh/drisspg/192/orig 2025-09-07T06:39:17.5619128Z * [new branch] gh/drisspg/193/base -> origin/gh/drisspg/193/base 2025-09-07T06:39:17.5619203Z * [new branch] gh/drisspg/193/head -> origin/gh/drisspg/193/head 2025-09-07T06:39:17.5619276Z * [new branch] gh/drisspg/193/orig -> origin/gh/drisspg/193/orig 2025-09-07T06:39:17.5619349Z * [new branch] gh/drisspg/194/base -> origin/gh/drisspg/194/base 2025-09-07T06:39:17.5619422Z * [new branch] gh/drisspg/194/head -> origin/gh/drisspg/194/head 2025-09-07T06:39:17.5619495Z * [new branch] gh/drisspg/194/orig -> origin/gh/drisspg/194/orig 2025-09-07T06:39:17.5619566Z * [new branch] gh/drisspg/195/base -> origin/gh/drisspg/195/base 2025-09-07T06:39:17.5619640Z * [new branch] gh/drisspg/195/head -> origin/gh/drisspg/195/head 2025-09-07T06:39:17.5619712Z * [new branch] gh/drisspg/195/orig -> origin/gh/drisspg/195/orig 2025-09-07T06:39:17.5619785Z * [new branch] gh/drisspg/196/base -> origin/gh/drisspg/196/base 2025-09-07T06:39:17.5619860Z * [new branch] gh/drisspg/196/head -> origin/gh/drisspg/196/head 2025-09-07T06:39:17.5619933Z * [new branch] gh/drisspg/196/orig -> origin/gh/drisspg/196/orig 2025-09-07T06:39:17.5620007Z * [new branch] gh/drisspg/197/base -> origin/gh/drisspg/197/base 2025-09-07T06:39:17.5620079Z * [new branch] gh/drisspg/197/head -> origin/gh/drisspg/197/head 2025-09-07T06:39:17.5620153Z * [new branch] gh/drisspg/197/orig -> origin/gh/drisspg/197/orig 2025-09-07T06:39:17.5620225Z * [new branch] gh/drisspg/198/base -> origin/gh/drisspg/198/base 2025-09-07T06:39:17.5620300Z * [new branch] gh/drisspg/198/head -> origin/gh/drisspg/198/head 2025-09-07T06:39:17.5620376Z * [new branch] gh/drisspg/198/orig -> origin/gh/drisspg/198/orig 2025-09-07T06:39:17.5620450Z * [new branch] gh/drisspg/199/base -> origin/gh/drisspg/199/base 2025-09-07T06:39:17.5620525Z * [new branch] gh/drisspg/199/head -> origin/gh/drisspg/199/head 2025-09-07T06:39:17.5620598Z * [new branch] gh/drisspg/199/orig -> origin/gh/drisspg/199/orig 2025-09-07T06:39:17.5621858Z * [new branch] gh/dsjohns2/1/base -> origin/gh/dsjohns2/1/base 2025-09-07T06:39:17.5621943Z * [new branch] gh/dsjohns2/1/head -> origin/gh/dsjohns2/1/head 2025-09-07T06:39:17.5622025Z * [new branch] gh/eellison/784/base -> origin/gh/eellison/784/base 2025-09-07T06:39:17.5622102Z * [new branch] gh/eellison/784/head -> origin/gh/eellison/784/head 2025-09-07T06:39:17.5622177Z * [new branch] gh/eellison/784/orig -> origin/gh/eellison/784/orig 2025-09-07T06:39:17.5622255Z * [new branch] gh/eellison/785/base -> origin/gh/eellison/785/base 2025-09-07T06:39:17.5622362Z * [new branch] gh/eellison/785/head -> origin/gh/eellison/785/head 2025-09-07T06:39:17.5622440Z * [new branch] gh/eellison/785/orig -> origin/gh/eellison/785/orig 2025-09-07T06:39:17.5622514Z * [new branch] gh/eellison/789/base -> origin/gh/eellison/789/base 2025-09-07T06:39:17.5622590Z * [new branch] gh/eellison/789/head -> origin/gh/eellison/789/head 2025-09-07T06:39:17.5622664Z * [new branch] gh/eellison/789/orig -> origin/gh/eellison/789/orig 2025-09-07T06:39:17.5622738Z * [new branch] gh/eellison/800/base -> origin/gh/eellison/800/base 2025-09-07T06:39:17.5622811Z * [new branch] gh/eellison/800/head -> origin/gh/eellison/800/head 2025-09-07T06:39:17.5622885Z * [new branch] gh/eellison/800/orig -> origin/gh/eellison/800/orig 2025-09-07T06:39:17.5622963Z * [new branch] gh/eellison/801/base -> origin/gh/eellison/801/base 2025-09-07T06:39:17.5623038Z * [new branch] gh/eellison/801/head -> origin/gh/eellison/801/head 2025-09-07T06:39:17.5623111Z * [new branch] gh/eellison/801/orig -> origin/gh/eellison/801/orig 2025-09-07T06:39:17.5623187Z * [new branch] gh/eellison/802/base -> origin/gh/eellison/802/base 2025-09-07T06:39:17.5623260Z * [new branch] gh/eellison/802/head -> origin/gh/eellison/802/head 2025-09-07T06:39:17.5623333Z * [new branch] gh/eellison/802/orig -> origin/gh/eellison/802/orig 2025-09-07T06:39:17.5623407Z * [new branch] gh/eellison/805/base -> origin/gh/eellison/805/base 2025-09-07T06:39:17.5623481Z * [new branch] gh/eellison/805/head -> origin/gh/eellison/805/head 2025-09-07T06:39:17.5623554Z * [new branch] gh/eellison/805/orig -> origin/gh/eellison/805/orig 2025-09-07T06:39:17.5623628Z * [new branch] gh/eellison/808/base -> origin/gh/eellison/808/base 2025-09-07T06:39:17.5623705Z * [new branch] gh/eellison/808/head -> origin/gh/eellison/808/head 2025-09-07T06:39:17.5623778Z * [new branch] gh/eellison/808/orig -> origin/gh/eellison/808/orig 2025-09-07T06:39:17.5623854Z * [new branch] gh/eellison/809/base -> origin/gh/eellison/809/base 2025-09-07T06:39:17.5623928Z * [new branch] gh/eellison/809/head -> origin/gh/eellison/809/head 2025-09-07T06:39:17.5624002Z * [new branch] gh/eellison/809/orig -> origin/gh/eellison/809/orig 2025-09-07T06:39:17.5624076Z * [new branch] gh/eellison/813/base -> origin/gh/eellison/813/base 2025-09-07T06:39:17.5624150Z * [new branch] gh/eellison/813/head -> origin/gh/eellison/813/head 2025-09-07T06:39:17.5624224Z * [new branch] gh/eellison/813/orig -> origin/gh/eellison/813/orig 2025-09-07T06:39:17.5624297Z * [new branch] gh/eellison/814/base -> origin/gh/eellison/814/base 2025-09-07T06:39:17.5624374Z * [new branch] gh/eellison/814/head -> origin/gh/eellison/814/head 2025-09-07T06:39:17.5624447Z * [new branch] gh/eellison/814/orig -> origin/gh/eellison/814/orig 2025-09-07T06:39:17.5625754Z * [new branch] gh/eellison/815/base -> origin/gh/eellison/815/base 2025-09-07T06:39:17.5625843Z * [new branch] gh/eellison/815/head -> origin/gh/eellison/815/head 2025-09-07T06:39:17.5625918Z * [new branch] gh/eellison/815/orig -> origin/gh/eellison/815/orig 2025-09-07T06:39:17.5625992Z * [new branch] gh/eellison/816/base -> origin/gh/eellison/816/base 2025-09-07T06:39:17.5626065Z * [new branch] gh/eellison/816/head -> origin/gh/eellison/816/head 2025-09-07T06:39:17.5626139Z * [new branch] gh/eellison/816/orig -> origin/gh/eellison/816/orig 2025-09-07T06:39:17.5626252Z * [new branch] gh/eellison/817/base -> origin/gh/eellison/817/base 2025-09-07T06:39:17.5626325Z * [new branch] gh/eellison/817/head -> origin/gh/eellison/817/head 2025-09-07T06:39:17.5626398Z * [new branch] gh/eellison/817/orig -> origin/gh/eellison/817/orig 2025-09-07T06:39:17.5626474Z * [new branch] gh/eellison/818/base -> origin/gh/eellison/818/base 2025-09-07T06:39:17.5626634Z * [new branch] gh/eellison/818/head -> origin/gh/eellison/818/head 2025-09-07T06:39:17.5626711Z * [new branch] gh/eellison/818/orig -> origin/gh/eellison/818/orig 2025-09-07T06:39:17.5626786Z * [new branch] gh/eellison/819/base -> origin/gh/eellison/819/base 2025-09-07T06:39:17.5626862Z * [new branch] gh/eellison/819/head -> origin/gh/eellison/819/head 2025-09-07T06:39:17.5626935Z * [new branch] gh/eellison/819/orig -> origin/gh/eellison/819/orig 2025-09-07T06:39:17.5627011Z * [new branch] gh/eellison/820/base -> origin/gh/eellison/820/base 2025-09-07T06:39:17.5627084Z * [new branch] gh/eellison/820/head -> origin/gh/eellison/820/head 2025-09-07T06:39:17.5627163Z * [new branch] gh/eellison/820/orig -> origin/gh/eellison/820/orig 2025-09-07T06:39:17.5627238Z * [new branch] gh/eellison/821/base -> origin/gh/eellison/821/base 2025-09-07T06:39:17.5627314Z * [new branch] gh/eellison/821/head -> origin/gh/eellison/821/head 2025-09-07T06:39:17.5627387Z * [new branch] gh/eellison/821/orig -> origin/gh/eellison/821/orig 2025-09-07T06:39:17.5627461Z * [new branch] gh/eellison/822/base -> origin/gh/eellison/822/base 2025-09-07T06:39:17.5627534Z * [new branch] gh/eellison/822/head -> origin/gh/eellison/822/head 2025-09-07T06:39:17.5628821Z * [new branch] gh/eellison/822/orig -> origin/gh/eellison/822/orig 2025-09-07T06:39:17.5628914Z * [new branch] gh/eellison/823/base -> origin/gh/eellison/823/base 2025-09-07T06:39:17.5628989Z * [new branch] gh/eellison/823/head -> origin/gh/eellison/823/head 2025-09-07T06:39:17.5629069Z * [new branch] gh/eellison/823/orig -> origin/gh/eellison/823/orig 2025-09-07T06:39:17.5629148Z * [new branch] gh/etaf/132/base -> origin/gh/etaf/132/base 2025-09-07T06:39:17.5629219Z * [new branch] gh/etaf/132/head -> origin/gh/etaf/132/head 2025-09-07T06:39:17.5629288Z * [new branch] gh/etaf/132/orig -> origin/gh/etaf/132/orig 2025-09-07T06:39:17.5629357Z * [new branch] gh/etaf/138/base -> origin/gh/etaf/138/base 2025-09-07T06:39:17.5629423Z * [new branch] gh/etaf/138/head -> origin/gh/etaf/138/head 2025-09-07T06:39:17.5629491Z * [new branch] gh/etaf/138/orig -> origin/gh/etaf/138/orig 2025-09-07T06:39:17.5629559Z * [new branch] gh/etaf/140/base -> origin/gh/etaf/140/base 2025-09-07T06:39:17.5629626Z * [new branch] gh/etaf/140/head -> origin/gh/etaf/140/head 2025-09-07T06:39:17.5629744Z * [new branch] gh/etaf/140/orig -> origin/gh/etaf/140/orig 2025-09-07T06:39:17.5629812Z * [new branch] gh/etaf/143/base -> origin/gh/etaf/143/base 2025-09-07T06:39:17.5629879Z * [new branch] gh/etaf/143/head -> origin/gh/etaf/143/head 2025-09-07T06:39:17.5629945Z * [new branch] gh/etaf/143/orig -> origin/gh/etaf/143/orig 2025-09-07T06:39:17.5630011Z * [new branch] gh/etaf/147/base -> origin/gh/etaf/147/base 2025-09-07T06:39:17.5630080Z * [new branch] gh/etaf/147/head -> origin/gh/etaf/147/head 2025-09-07T06:39:17.5630147Z * [new branch] gh/etaf/151/base -> origin/gh/etaf/151/base 2025-09-07T06:39:17.5630245Z * [new branch] gh/etaf/151/head -> origin/gh/etaf/151/head 2025-09-07T06:39:17.5630312Z * [new branch] gh/etaf/151/orig -> origin/gh/etaf/151/orig 2025-09-07T06:39:17.5630380Z * [new branch] gh/etaf/152/base -> origin/gh/etaf/152/base 2025-09-07T06:39:17.5630450Z * [new branch] gh/etaf/152/head -> origin/gh/etaf/152/head 2025-09-07T06:39:17.5630520Z * [new branch] gh/etaf/152/orig -> origin/gh/etaf/152/orig 2025-09-07T06:39:17.5630588Z * [new branch] gh/etaf/153/base -> origin/gh/etaf/153/base 2025-09-07T06:39:17.5630653Z * [new branch] gh/etaf/153/head -> origin/gh/etaf/153/head 2025-09-07T06:39:17.5630721Z * [new branch] gh/etaf/153/orig -> origin/gh/etaf/153/orig 2025-09-07T06:39:17.5630788Z * [new branch] gh/etaf/154/base -> origin/gh/etaf/154/base 2025-09-07T06:39:17.5630856Z * [new branch] gh/etaf/154/head -> origin/gh/etaf/154/head 2025-09-07T06:39:17.5632094Z * [new branch] gh/etaf/154/orig -> origin/gh/etaf/154/orig 2025-09-07T06:39:17.5632176Z * [new branch] gh/etaf/155/base -> origin/gh/etaf/155/base 2025-09-07T06:39:17.5632243Z * [new branch] gh/etaf/155/head -> origin/gh/etaf/155/head 2025-09-07T06:39:17.5632310Z * [new branch] gh/etaf/155/orig -> origin/gh/etaf/155/orig 2025-09-07T06:39:17.5632379Z * [new branch] gh/etaf/156/base -> origin/gh/etaf/156/base 2025-09-07T06:39:17.5632447Z * [new branch] gh/etaf/156/head -> origin/gh/etaf/156/head 2025-09-07T06:39:17.5632513Z * [new branch] gh/etaf/156/orig -> origin/gh/etaf/156/orig 2025-09-07T06:39:17.5632581Z * [new branch] gh/etaf/157/base -> origin/gh/etaf/157/base 2025-09-07T06:39:17.5632650Z * [new branch] gh/etaf/157/head -> origin/gh/etaf/157/head 2025-09-07T06:39:17.5632718Z * [new branch] gh/etaf/157/orig -> origin/gh/etaf/157/orig 2025-09-07T06:39:17.5632787Z * [new branch] gh/etaf/158/base -> origin/gh/etaf/158/base 2025-09-07T06:39:17.5632855Z * [new branch] gh/etaf/158/head -> origin/gh/etaf/158/head 2025-09-07T06:39:17.5632923Z * [new branch] gh/etaf/158/orig -> origin/gh/etaf/158/orig 2025-09-07T06:39:17.5632991Z * [new branch] gh/etaf/159/base -> origin/gh/etaf/159/base 2025-09-07T06:39:17.5633059Z * [new branch] gh/etaf/159/head -> origin/gh/etaf/159/head 2025-09-07T06:39:17.5633127Z * [new branch] gh/etaf/159/orig -> origin/gh/etaf/159/orig 2025-09-07T06:39:17.5633194Z * [new branch] gh/etaf/160/base -> origin/gh/etaf/160/base 2025-09-07T06:39:17.5633264Z * [new branch] gh/etaf/160/head -> origin/gh/etaf/160/head 2025-09-07T06:39:17.5633331Z * [new branch] gh/etaf/160/orig -> origin/gh/etaf/160/orig 2025-09-07T06:39:17.5633399Z * [new branch] gh/etaf/161/base -> origin/gh/etaf/161/base 2025-09-07T06:39:17.5633518Z * [new branch] gh/etaf/161/head -> origin/gh/etaf/161/head 2025-09-07T06:39:17.5633587Z * [new branch] gh/etaf/161/orig -> origin/gh/etaf/161/orig 2025-09-07T06:39:17.5633654Z * [new branch] gh/etaf/162/base -> origin/gh/etaf/162/base 2025-09-07T06:39:17.5634910Z * [new branch] gh/etaf/162/head -> origin/gh/etaf/162/head 2025-09-07T06:39:17.5634985Z * [new branch] gh/etaf/162/orig -> origin/gh/etaf/162/orig 2025-09-07T06:39:17.5635055Z * [new branch] gh/etaf/163/base -> origin/gh/etaf/163/base 2025-09-07T06:39:17.5635158Z * [new branch] gh/etaf/163/head -> origin/gh/etaf/163/head 2025-09-07T06:39:17.5635226Z * [new branch] gh/etaf/163/orig -> origin/gh/etaf/163/orig 2025-09-07T06:39:17.5635292Z * [new branch] gh/etaf/164/base -> origin/gh/etaf/164/base 2025-09-07T06:39:17.5635363Z * [new branch] gh/etaf/164/head -> origin/gh/etaf/164/head 2025-09-07T06:39:17.5635430Z * [new branch] gh/etaf/164/orig -> origin/gh/etaf/164/orig 2025-09-07T06:39:17.5635497Z * [new branch] gh/etaf/165/base -> origin/gh/etaf/165/base 2025-09-07T06:39:17.5635566Z * [new branch] gh/etaf/165/orig -> origin/gh/etaf/165/orig 2025-09-07T06:39:17.5635634Z * [new branch] gh/etaf/166/base -> origin/gh/etaf/166/base 2025-09-07T06:39:17.5635701Z * [new branch] gh/etaf/166/head -> origin/gh/etaf/166/head 2025-09-07T06:39:17.5635771Z * [new branch] gh/etaf/166/orig -> origin/gh/etaf/166/orig 2025-09-07T06:39:17.5635838Z * [new branch] gh/etaf/167/base -> origin/gh/etaf/167/base 2025-09-07T06:39:17.5635906Z * [new branch] gh/etaf/167/head -> origin/gh/etaf/167/head 2025-09-07T06:39:17.5635976Z * [new branch] gh/etaf/167/orig -> origin/gh/etaf/167/orig 2025-09-07T06:39:17.5636043Z * [new branch] gh/etaf/168/base -> origin/gh/etaf/168/base 2025-09-07T06:39:17.5636110Z * [new branch] gh/etaf/168/head -> origin/gh/etaf/168/head 2025-09-07T06:39:17.5636178Z * [new branch] gh/etaf/168/orig -> origin/gh/etaf/168/orig 2025-09-07T06:39:17.5636249Z * [new branch] gh/etaf/169/base -> origin/gh/etaf/169/base 2025-09-07T06:39:17.5636321Z * [new branch] gh/etaf/169/head -> origin/gh/etaf/169/head 2025-09-07T06:39:17.5636391Z * [new branch] gh/etaf/169/orig -> origin/gh/etaf/169/orig 2025-09-07T06:39:17.5637760Z * [new branch] gh/exclamaforte/1/base -> origin/gh/exclamaforte/1/base 2025-09-07T06:39:17.5637852Z * [new branch] gh/exclamaforte/1/head -> origin/gh/exclamaforte/1/head 2025-09-07T06:39:17.5637936Z * [new branch] gh/exclamaforte/2/base -> origin/gh/exclamaforte/2/base 2025-09-07T06:39:17.5638091Z * [new branch] gh/exclamaforte/2/head -> origin/gh/exclamaforte/2/head 2025-09-07T06:39:17.5638174Z * [new branch] gh/exclamaforte/3/base -> origin/gh/exclamaforte/3/base 2025-09-07T06:39:17.5638253Z * [new branch] gh/exclamaforte/3/head -> origin/gh/exclamaforte/3/head 2025-09-07T06:39:17.5638334Z * [new branch] gh/exclamaforte/4/base -> origin/gh/exclamaforte/4/base 2025-09-07T06:39:17.5638412Z * [new branch] gh/exclamaforte/4/head -> origin/gh/exclamaforte/4/head 2025-09-07T06:39:17.5638490Z * [new branch] gh/ezyang/2374/base -> origin/gh/ezyang/2374/base 2025-09-07T06:39:17.5638566Z * [new branch] gh/ezyang/2374/head -> origin/gh/ezyang/2374/head 2025-09-07T06:39:17.5638688Z * [new branch] gh/ezyang/2374/orig -> origin/gh/ezyang/2374/orig 2025-09-07T06:39:17.5638761Z * [new branch] gh/ezyang/2973/base -> origin/gh/ezyang/2973/base 2025-09-07T06:39:17.5638833Z * [new branch] gh/ezyang/2973/head -> origin/gh/ezyang/2973/head 2025-09-07T06:39:17.5638904Z * [new branch] gh/ezyang/2973/orig -> origin/gh/ezyang/2973/orig 2025-09-07T06:39:17.5638975Z * [new branch] gh/ezyang/2974/base -> origin/gh/ezyang/2974/base 2025-09-07T06:39:17.5639047Z * [new branch] gh/ezyang/2974/head -> origin/gh/ezyang/2974/head 2025-09-07T06:39:17.5639118Z * [new branch] gh/ezyang/2974/orig -> origin/gh/ezyang/2974/orig 2025-09-07T06:39:17.5639225Z * [new branch] gh/ezyang/3074/base -> origin/gh/ezyang/3074/base 2025-09-07T06:39:17.5639298Z * [new branch] gh/ezyang/3074/head -> origin/gh/ezyang/3074/head 2025-09-07T06:39:17.5639370Z * [new branch] gh/ezyang/3074/orig -> origin/gh/ezyang/3074/orig 2025-09-07T06:39:17.5639441Z * [new branch] gh/ezyang/3088/base -> origin/gh/ezyang/3088/base 2025-09-07T06:39:17.5639515Z * [new branch] gh/ezyang/3088/head -> origin/gh/ezyang/3088/head 2025-09-07T06:39:17.5639588Z * [new branch] gh/ezyang/3088/orig -> origin/gh/ezyang/3088/orig 2025-09-07T06:39:17.5639660Z * [new branch] gh/ezyang/3092/base -> origin/gh/ezyang/3092/base 2025-09-07T06:39:17.5639732Z * [new branch] gh/ezyang/3092/head -> origin/gh/ezyang/3092/head 2025-09-07T06:39:17.5639808Z * [new branch] gh/ezyang/3092/orig -> origin/gh/ezyang/3092/orig 2025-09-07T06:39:17.5639881Z * [new branch] gh/ezyang/3103/base -> origin/gh/ezyang/3103/base 2025-09-07T06:39:17.5639953Z * [new branch] gh/ezyang/3103/head -> origin/gh/ezyang/3103/head 2025-09-07T06:39:17.5640028Z * [new branch] gh/ezyang/3103/orig -> origin/gh/ezyang/3103/orig 2025-09-07T06:39:17.5640099Z * [new branch] gh/ezyang/3105/base -> origin/gh/ezyang/3105/base 2025-09-07T06:39:17.5640171Z * [new branch] gh/ezyang/3105/head -> origin/gh/ezyang/3105/head 2025-09-07T06:39:17.5640244Z * [new branch] gh/ezyang/3105/orig -> origin/gh/ezyang/3105/orig 2025-09-07T06:39:17.5640315Z * [new branch] gh/ezyang/3114/base -> origin/gh/ezyang/3114/base 2025-09-07T06:39:17.5640387Z * [new branch] gh/ezyang/3114/head -> origin/gh/ezyang/3114/head 2025-09-07T06:39:17.5640461Z * [new branch] gh/ezyang/3114/orig -> origin/gh/ezyang/3114/orig 2025-09-07T06:39:17.5640532Z * [new branch] gh/ezyang/3116/base -> origin/gh/ezyang/3116/base 2025-09-07T06:39:17.5640603Z * [new branch] gh/ezyang/3116/head -> origin/gh/ezyang/3116/head 2025-09-07T06:39:17.5641891Z * [new branch] gh/ezyang/3116/orig -> origin/gh/ezyang/3116/orig 2025-09-07T06:39:17.5641979Z * [new branch] gh/ezyang/3120/base -> origin/gh/ezyang/3120/base 2025-09-07T06:39:17.5642053Z * [new branch] gh/ezyang/3120/head -> origin/gh/ezyang/3120/head 2025-09-07T06:39:17.5642126Z * [new branch] gh/ezyang/3120/orig -> origin/gh/ezyang/3120/orig 2025-09-07T06:39:17.5642200Z * [new branch] gh/ezyang/3122/base -> origin/gh/ezyang/3122/base 2025-09-07T06:39:17.5642272Z * [new branch] gh/ezyang/3122/head -> origin/gh/ezyang/3122/head 2025-09-07T06:39:17.5642345Z * [new branch] gh/ezyang/3122/orig -> origin/gh/ezyang/3122/orig 2025-09-07T06:39:17.5642418Z * [new branch] gh/ezyang/3123/base -> origin/gh/ezyang/3123/base 2025-09-07T06:39:17.5642489Z * [new branch] gh/ezyang/3123/head -> origin/gh/ezyang/3123/head 2025-09-07T06:39:17.5642612Z * [new branch] gh/ezyang/3123/orig -> origin/gh/ezyang/3123/orig 2025-09-07T06:39:17.5642686Z * [new branch] gh/ezyang/3125/base -> origin/gh/ezyang/3125/base 2025-09-07T06:39:17.5642758Z * [new branch] gh/ezyang/3125/head -> origin/gh/ezyang/3125/head 2025-09-07T06:39:17.5642830Z * [new branch] gh/ezyang/3125/orig -> origin/gh/ezyang/3125/orig 2025-09-07T06:39:17.5642908Z * [new branch] gh/ezyang/3126/base -> origin/gh/ezyang/3126/base 2025-09-07T06:39:17.5642980Z * [new branch] gh/ezyang/3126/head -> origin/gh/ezyang/3126/head 2025-09-07T06:39:17.5643078Z * [new branch] gh/ezyang/3126/orig -> origin/gh/ezyang/3126/orig 2025-09-07T06:39:17.5643150Z * [new branch] gh/ezyang/3127/base -> origin/gh/ezyang/3127/base 2025-09-07T06:39:17.5643224Z * [new branch] gh/ezyang/3127/head -> origin/gh/ezyang/3127/head 2025-09-07T06:39:17.5643296Z * [new branch] gh/ezyang/3127/orig -> origin/gh/ezyang/3127/orig 2025-09-07T06:39:17.5643369Z * [new branch] gh/ezyang/3128/base -> origin/gh/ezyang/3128/base 2025-09-07T06:39:17.5643441Z * [new branch] gh/ezyang/3128/head -> origin/gh/ezyang/3128/head 2025-09-07T06:39:17.5643513Z * [new branch] gh/ezyang/3128/orig -> origin/gh/ezyang/3128/orig 2025-09-07T06:39:17.5643588Z * [new branch] gh/ezyang/3129/base -> origin/gh/ezyang/3129/base 2025-09-07T06:39:17.5644896Z * [new branch] gh/ezyang/3129/head -> origin/gh/ezyang/3129/head 2025-09-07T06:39:17.5644974Z * [new branch] gh/ezyang/3129/orig -> origin/gh/ezyang/3129/orig 2025-09-07T06:39:17.5645050Z * [new branch] gh/ezyang/3130/base -> origin/gh/ezyang/3130/base 2025-09-07T06:39:17.5645129Z * [new branch] gh/ezyang/3130/head -> origin/gh/ezyang/3130/head 2025-09-07T06:39:17.5645202Z * [new branch] gh/ezyang/3130/orig -> origin/gh/ezyang/3130/orig 2025-09-07T06:39:17.5645273Z * [new branch] gh/ezyang/3131/base -> origin/gh/ezyang/3131/base 2025-09-07T06:39:17.5645347Z * [new branch] gh/ezyang/3131/head -> origin/gh/ezyang/3131/head 2025-09-07T06:39:17.5645418Z * [new branch] gh/ezyang/3131/orig -> origin/gh/ezyang/3131/orig 2025-09-07T06:39:17.5645490Z * [new branch] gh/ezyang/3132/base -> origin/gh/ezyang/3132/base 2025-09-07T06:39:17.5645563Z * [new branch] gh/ezyang/3132/head -> origin/gh/ezyang/3132/head 2025-09-07T06:39:17.5645636Z * [new branch] gh/ezyang/3132/orig -> origin/gh/ezyang/3132/orig 2025-09-07T06:39:17.5645711Z * [new branch] gh/ezyang/3133/base -> origin/gh/ezyang/3133/base 2025-09-07T06:39:17.5645788Z * [new branch] gh/ezyang/3133/head -> origin/gh/ezyang/3133/head 2025-09-07T06:39:17.5645862Z * [new branch] gh/ezyang/3133/orig -> origin/gh/ezyang/3133/orig 2025-09-07T06:39:17.5645934Z * [new branch] gh/ezyang/3134/base -> origin/gh/ezyang/3134/base 2025-09-07T06:39:17.5646007Z * [new branch] gh/ezyang/3134/head -> origin/gh/ezyang/3134/head 2025-09-07T06:39:17.5646079Z * [new branch] gh/ezyang/3134/orig -> origin/gh/ezyang/3134/orig 2025-09-07T06:39:17.5646149Z * [new branch] gh/ezyang/3135/base -> origin/gh/ezyang/3135/base 2025-09-07T06:39:17.5646226Z * [new branch] gh/ezyang/3135/head -> origin/gh/ezyang/3135/head 2025-09-07T06:39:17.5646300Z * [new branch] gh/ezyang/3135/orig -> origin/gh/ezyang/3135/orig 2025-09-07T06:39:17.5646371Z * [new branch] gh/ezyang/3136/base -> origin/gh/ezyang/3136/base 2025-09-07T06:39:17.5646476Z * [new branch] gh/ezyang/3136/head -> origin/gh/ezyang/3136/head 2025-09-07T06:39:17.5646634Z * [new branch] gh/ezyang/3136/orig -> origin/gh/ezyang/3136/orig 2025-09-07T06:39:17.5646706Z * [new branch] gh/ezyang/3137/base -> origin/gh/ezyang/3137/base 2025-09-07T06:39:17.5646779Z * [new branch] gh/ezyang/3137/head -> origin/gh/ezyang/3137/head 2025-09-07T06:39:17.5646852Z * [new branch] gh/ezyang/3137/orig -> origin/gh/ezyang/3137/orig 2025-09-07T06:39:17.5646923Z * [new branch] gh/ezyang/3138/base -> origin/gh/ezyang/3138/base 2025-09-07T06:39:17.5647038Z * [new branch] gh/ezyang/3138/head -> origin/gh/ezyang/3138/head 2025-09-07T06:39:17.5647110Z * [new branch] gh/ezyang/3138/orig -> origin/gh/ezyang/3138/orig 2025-09-07T06:39:17.5647180Z * [new branch] gh/ezyang/3139/base -> origin/gh/ezyang/3139/base 2025-09-07T06:39:17.5648634Z * [new branch] gh/ezyang/3139/head -> origin/gh/ezyang/3139/head 2025-09-07T06:39:17.5648721Z * [new branch] gh/ezyang/3139/orig -> origin/gh/ezyang/3139/orig 2025-09-07T06:39:17.5648797Z * [new branch] gh/ezyang/3140/base -> origin/gh/ezyang/3140/base 2025-09-07T06:39:17.5648869Z * [new branch] gh/ezyang/3140/head -> origin/gh/ezyang/3140/head 2025-09-07T06:39:17.5648943Z * [new branch] gh/ezyang/3140/orig -> origin/gh/ezyang/3140/orig 2025-09-07T06:39:17.5649018Z * [new branch] gh/ezyang/3141/base -> origin/gh/ezyang/3141/base 2025-09-07T06:39:17.5649092Z * [new branch] gh/ezyang/3141/head -> origin/gh/ezyang/3141/head 2025-09-07T06:39:17.5649163Z * [new branch] gh/ezyang/3141/orig -> origin/gh/ezyang/3141/orig 2025-09-07T06:39:17.5649239Z * [new branch] gh/ezyang/3142/base -> origin/gh/ezyang/3142/base 2025-09-07T06:39:17.5649315Z * [new branch] gh/ezyang/3142/head -> origin/gh/ezyang/3142/head 2025-09-07T06:39:17.5649386Z * [new branch] gh/ezyang/3142/orig -> origin/gh/ezyang/3142/orig 2025-09-07T06:39:17.5649459Z * [new branch] gh/ezyang/3143/base -> origin/gh/ezyang/3143/base 2025-09-07T06:39:17.5649530Z * [new branch] gh/ezyang/3143/head -> origin/gh/ezyang/3143/head 2025-09-07T06:39:17.5649605Z * [new branch] gh/ezyang/3143/orig -> origin/gh/ezyang/3143/orig 2025-09-07T06:39:17.5649680Z * [new branch] gh/fadara01/1/base -> origin/gh/fadara01/1/base 2025-09-07T06:39:17.5649754Z * [new branch] gh/fadara01/1/head -> origin/gh/fadara01/1/head 2025-09-07T06:39:17.5649826Z * [new branch] gh/fadara01/1/orig -> origin/gh/fadara01/1/orig 2025-09-07T06:39:17.5649899Z * [new branch] gh/fduwjj/171/base -> origin/gh/fduwjj/171/base 2025-09-07T06:39:17.5649973Z * [new branch] gh/fduwjj/171/head -> origin/gh/fduwjj/171/head 2025-09-07T06:39:17.5650044Z * [new branch] gh/fduwjj/171/orig -> origin/gh/fduwjj/171/orig 2025-09-07T06:39:17.5650114Z * [new branch] gh/fduwjj/175/base -> origin/gh/fduwjj/175/base 2025-09-07T06:39:17.5650185Z * [new branch] gh/fduwjj/175/head -> origin/gh/fduwjj/175/head 2025-09-07T06:39:17.5651650Z * [new branch] gh/fduwjj/175/orig -> origin/gh/fduwjj/175/orig 2025-09-07T06:39:17.5651738Z * [new branch] gh/fduwjj/176/base -> origin/gh/fduwjj/176/base 2025-09-07T06:39:17.5651812Z * [new branch] gh/fduwjj/176/head -> origin/gh/fduwjj/176/head 2025-09-07T06:39:17.5651885Z * [new branch] gh/fduwjj/176/orig -> origin/gh/fduwjj/176/orig 2025-09-07T06:39:17.5652006Z * [new branch] gh/fduwjj/177/base -> origin/gh/fduwjj/177/base 2025-09-07T06:39:17.5652080Z * [new branch] gh/fduwjj/177/head -> origin/gh/fduwjj/177/head 2025-09-07T06:39:17.5652152Z * [new branch] gh/fduwjj/177/orig -> origin/gh/fduwjj/177/orig 2025-09-07T06:39:17.5652222Z * [new branch] gh/fduwjj/178/base -> origin/gh/fduwjj/178/base 2025-09-07T06:39:17.5652292Z * [new branch] gh/fduwjj/178/head -> origin/gh/fduwjj/178/head 2025-09-07T06:39:17.5652363Z * [new branch] gh/fduwjj/178/orig -> origin/gh/fduwjj/178/orig 2025-09-07T06:39:17.5652432Z * [new branch] gh/fduwjj/179/base -> origin/gh/fduwjj/179/base 2025-09-07T06:39:17.5652532Z * [new branch] gh/fduwjj/179/head -> origin/gh/fduwjj/179/head 2025-09-07T06:39:17.5652602Z * [new branch] gh/fduwjj/179/orig -> origin/gh/fduwjj/179/orig 2025-09-07T06:39:17.5652673Z * [new branch] gh/fduwjj/180/base -> origin/gh/fduwjj/180/base 2025-09-07T06:39:17.5652745Z * [new branch] gh/fduwjj/180/head -> origin/gh/fduwjj/180/head 2025-09-07T06:39:17.5652815Z * [new branch] gh/fduwjj/180/orig -> origin/gh/fduwjj/180/orig 2025-09-07T06:39:17.5652886Z * [new branch] gh/fduwjj/181/base -> origin/gh/fduwjj/181/base 2025-09-07T06:39:17.5652957Z * [new branch] gh/fduwjj/181/head -> origin/gh/fduwjj/181/head 2025-09-07T06:39:17.5653030Z * [new branch] gh/fduwjj/181/orig -> origin/gh/fduwjj/181/orig 2025-09-07T06:39:17.5653099Z * [new branch] gh/fduwjj/182/base -> origin/gh/fduwjj/182/base 2025-09-07T06:39:17.5653172Z * [new branch] gh/fduwjj/182/head -> origin/gh/fduwjj/182/head 2025-09-07T06:39:17.5653242Z * [new branch] gh/fduwjj/182/orig -> origin/gh/fduwjj/182/orig 2025-09-07T06:39:17.5653313Z * [new branch] gh/fduwjj/183/base -> origin/gh/fduwjj/183/base 2025-09-07T06:39:17.5654654Z * [new branch] gh/fduwjj/183/head -> origin/gh/fduwjj/183/head 2025-09-07T06:39:17.5654736Z * [new branch] gh/fduwjj/183/orig -> origin/gh/fduwjj/183/orig 2025-09-07T06:39:17.5654810Z * [new branch] gh/fduwjj/184/base -> origin/gh/fduwjj/184/base 2025-09-07T06:39:17.5654882Z * [new branch] gh/fduwjj/184/head -> origin/gh/fduwjj/184/head 2025-09-07T06:39:17.5654956Z * [new branch] gh/fduwjj/184/orig -> origin/gh/fduwjj/184/orig 2025-09-07T06:39:17.5655028Z * [new branch] gh/fduwjj/185/base -> origin/gh/fduwjj/185/base 2025-09-07T06:39:17.5655101Z * [new branch] gh/fduwjj/185/head -> origin/gh/fduwjj/185/head 2025-09-07T06:39:17.5655173Z * [new branch] gh/fduwjj/185/orig -> origin/gh/fduwjj/185/orig 2025-09-07T06:39:17.5655245Z * [new branch] gh/fduwjj/186/base -> origin/gh/fduwjj/186/base 2025-09-07T06:39:17.5655316Z * [new branch] gh/fduwjj/186/head -> origin/gh/fduwjj/186/head 2025-09-07T06:39:17.5655386Z * [new branch] gh/fduwjj/186/orig -> origin/gh/fduwjj/186/orig 2025-09-07T06:39:17.5655455Z * [new branch] gh/fduwjj/187/base -> origin/gh/fduwjj/187/base 2025-09-07T06:39:17.5655529Z * [new branch] gh/fduwjj/187/head -> origin/gh/fduwjj/187/head 2025-09-07T06:39:17.5655600Z * [new branch] gh/fduwjj/187/orig -> origin/gh/fduwjj/187/orig 2025-09-07T06:39:17.5655673Z * [new branch] gh/fduwjj/188/base -> origin/gh/fduwjj/188/base 2025-09-07T06:39:17.5655748Z * [new branch] gh/fduwjj/188/head -> origin/gh/fduwjj/188/head 2025-09-07T06:39:17.5655819Z * [new branch] gh/fduwjj/188/orig -> origin/gh/fduwjj/188/orig 2025-09-07T06:39:17.5655925Z * [new branch] gh/fduwjj/189/base -> origin/gh/fduwjj/189/base 2025-09-07T06:39:17.5655997Z * [new branch] gh/fduwjj/189/head -> origin/gh/fduwjj/189/head 2025-09-07T06:39:17.5656069Z * [new branch] gh/fduwjj/189/orig -> origin/gh/fduwjj/189/orig 2025-09-07T06:39:17.5656139Z * [new branch] gh/fduwjj/190/base -> origin/gh/fduwjj/190/base 2025-09-07T06:39:17.5656208Z * [new branch] gh/fduwjj/190/head -> origin/gh/fduwjj/190/head 2025-09-07T06:39:17.5656281Z * [new branch] gh/fduwjj/190/orig -> origin/gh/fduwjj/190/orig 2025-09-07T06:39:17.5656377Z * [new branch] gh/fduwjj/191/base -> origin/gh/fduwjj/191/base 2025-09-07T06:39:17.5656450Z * [new branch] gh/fduwjj/191/head -> origin/gh/fduwjj/191/head 2025-09-07T06:39:17.5656614Z * [new branch] gh/fduwjj/191/orig -> origin/gh/fduwjj/191/orig 2025-09-07T06:39:17.5656694Z * [new branch] gh/fegin/306/base -> origin/gh/fegin/306/base 2025-09-07T06:39:17.5656765Z * [new branch] gh/fegin/306/head -> origin/gh/fegin/306/head 2025-09-07T06:39:17.5656834Z * [new branch] gh/fegin/306/orig -> origin/gh/fegin/306/orig 2025-09-07T06:39:17.5656904Z * [new branch] gh/fegin/307/base -> origin/gh/fegin/307/base 2025-09-07T06:39:17.5656972Z * [new branch] gh/fegin/307/head -> origin/gh/fegin/307/head 2025-09-07T06:39:17.5657041Z * [new branch] gh/fegin/307/orig -> origin/gh/fegin/307/orig 2025-09-07T06:39:17.5657112Z * [new branch] gh/fegin/308/base -> origin/gh/fegin/308/base 2025-09-07T06:39:17.5657181Z * [new branch] gh/fegin/308/head -> origin/gh/fegin/308/head 2025-09-07T06:39:17.5657249Z * [new branch] gh/fegin/308/orig -> origin/gh/fegin/308/orig 2025-09-07T06:39:17.5658562Z * [new branch] gh/fegin/309/base -> origin/gh/fegin/309/base 2025-09-07T06:39:17.5658642Z * [new branch] gh/fegin/309/head -> origin/gh/fegin/309/head 2025-09-07T06:39:17.5658711Z * [new branch] gh/fegin/309/orig -> origin/gh/fegin/309/orig 2025-09-07T06:39:17.5658782Z * [new branch] gh/fegin/310/base -> origin/gh/fegin/310/base 2025-09-07T06:39:17.5658851Z * [new branch] gh/fegin/310/head -> origin/gh/fegin/310/head 2025-09-07T06:39:17.5658923Z * [new branch] gh/fegin/310/orig -> origin/gh/fegin/310/orig 2025-09-07T06:39:17.5658994Z * [new branch] gh/fegin/311/base -> origin/gh/fegin/311/base 2025-09-07T06:39:17.5659068Z * [new branch] gh/fegin/311/head -> origin/gh/fegin/311/head 2025-09-07T06:39:17.5659136Z * [new branch] gh/fegin/311/orig -> origin/gh/fegin/311/orig 2025-09-07T06:39:17.5659206Z * [new branch] gh/fegin/312/base -> origin/gh/fegin/312/base 2025-09-07T06:39:17.5659276Z * [new branch] gh/fegin/312/head -> origin/gh/fegin/312/head 2025-09-07T06:39:17.5659346Z * [new branch] gh/fegin/312/orig -> origin/gh/fegin/312/orig 2025-09-07T06:39:17.5659414Z * [new branch] gh/fegin/313/base -> origin/gh/fegin/313/base 2025-09-07T06:39:17.5659483Z * [new branch] gh/fegin/313/head -> origin/gh/fegin/313/head 2025-09-07T06:39:17.5659552Z * [new branch] gh/fegin/313/orig -> origin/gh/fegin/313/orig 2025-09-07T06:39:17.5659628Z * [new branch] gh/fffrog/124/base -> origin/gh/fffrog/124/base 2025-09-07T06:39:17.5659700Z * [new branch] gh/fffrog/124/head -> origin/gh/fffrog/124/head 2025-09-07T06:39:17.5659771Z * [new branch] gh/fffrog/124/orig -> origin/gh/fffrog/124/orig 2025-09-07T06:39:17.5659903Z * [new branch] gh/fffrog/129/base -> origin/gh/fffrog/129/base 2025-09-07T06:39:17.5659976Z * [new branch] gh/fffrog/129/head -> origin/gh/fffrog/129/head 2025-09-07T06:39:17.5660050Z * [new branch] gh/fffrog/129/orig -> origin/gh/fffrog/129/orig 2025-09-07T06:39:17.5660121Z * [new branch] gh/fffrog/130/base -> origin/gh/fffrog/130/base 2025-09-07T06:39:17.5661584Z * [new branch] gh/fffrog/130/head -> origin/gh/fffrog/130/head 2025-09-07T06:39:17.5661667Z * [new branch] gh/fffrog/130/orig -> origin/gh/fffrog/130/orig 2025-09-07T06:39:17.5661785Z * [new branch] gh/fffrog/131/base -> origin/gh/fffrog/131/base 2025-09-07T06:39:17.5661858Z * [new branch] gh/fffrog/131/head -> origin/gh/fffrog/131/head 2025-09-07T06:39:17.5661928Z * [new branch] gh/fffrog/131/orig -> origin/gh/fffrog/131/orig 2025-09-07T06:39:17.5662004Z * [new branch] gh/fffrog/132/base -> origin/gh/fffrog/132/base 2025-09-07T06:39:17.5662076Z * [new branch] gh/fffrog/132/head -> origin/gh/fffrog/132/head 2025-09-07T06:39:17.5662146Z * [new branch] gh/fffrog/132/orig -> origin/gh/fffrog/132/orig 2025-09-07T06:39:17.5662219Z * [new branch] gh/fffrog/133/base -> origin/gh/fffrog/133/base 2025-09-07T06:39:17.5662289Z * [new branch] gh/fffrog/133/head -> origin/gh/fffrog/133/head 2025-09-07T06:39:17.5662363Z * [new branch] gh/fffrog/133/orig -> origin/gh/fffrog/133/orig 2025-09-07T06:39:17.5662435Z * [new branch] gh/fffrog/134/base -> origin/gh/fffrog/134/base 2025-09-07T06:39:17.5662506Z * [new branch] gh/fffrog/134/head -> origin/gh/fffrog/134/head 2025-09-07T06:39:17.5662577Z * [new branch] gh/fffrog/134/orig -> origin/gh/fffrog/134/orig 2025-09-07T06:39:17.5662649Z * [new branch] gh/fffrog/135/base -> origin/gh/fffrog/135/base 2025-09-07T06:39:17.5662720Z * [new branch] gh/fffrog/135/head -> origin/gh/fffrog/135/head 2025-09-07T06:39:17.5662790Z * [new branch] gh/fffrog/135/orig -> origin/gh/fffrog/135/orig 2025-09-07T06:39:17.5662862Z * [new branch] gh/fffrog/136/base -> origin/gh/fffrog/136/base 2025-09-07T06:39:17.5662932Z * [new branch] gh/fffrog/136/head -> origin/gh/fffrog/136/head 2025-09-07T06:39:17.5663004Z * [new branch] gh/fffrog/136/orig -> origin/gh/fffrog/136/orig 2025-09-07T06:39:17.5663077Z * [new branch] gh/fffrog/137/base -> origin/gh/fffrog/137/base 2025-09-07T06:39:17.5663148Z * [new branch] gh/fffrog/137/head -> origin/gh/fffrog/137/head 2025-09-07T06:39:17.5663222Z * [new branch] gh/fffrog/137/orig -> origin/gh/fffrog/137/orig 2025-09-07T06:39:17.5664512Z * [new branch] gh/fffrog/138/base -> origin/gh/fffrog/138/base 2025-09-07T06:39:17.5664599Z * [new branch] gh/fffrog/138/head -> origin/gh/fffrog/138/head 2025-09-07T06:39:17.5664673Z * [new branch] gh/fffrog/138/orig -> origin/gh/fffrog/138/orig 2025-09-07T06:39:17.5664747Z * [new branch] gh/fffrog/139/base -> origin/gh/fffrog/139/base 2025-09-07T06:39:17.5664823Z * [new branch] gh/fffrog/139/head -> origin/gh/fffrog/139/head 2025-09-07T06:39:17.5664897Z * [new branch] gh/fffrog/139/orig -> origin/gh/fffrog/139/orig 2025-09-07T06:39:17.5664972Z * [new branch] gh/fffrog/140/base -> origin/gh/fffrog/140/base 2025-09-07T06:39:17.5665042Z * [new branch] gh/fffrog/140/head -> origin/gh/fffrog/140/head 2025-09-07T06:39:17.5665152Z * [new branch] gh/fffrog/140/orig -> origin/gh/fffrog/140/orig 2025-09-07T06:39:17.5665225Z * [new branch] gh/fffrog/141/base -> origin/gh/fffrog/141/base 2025-09-07T06:39:17.5665297Z * [new branch] gh/fffrog/141/head -> origin/gh/fffrog/141/head 2025-09-07T06:39:17.5665369Z * [new branch] gh/fffrog/141/orig -> origin/gh/fffrog/141/orig 2025-09-07T06:39:17.5665440Z * [new branch] gh/fffrog/142/base -> origin/gh/fffrog/142/base 2025-09-07T06:39:17.5665509Z * [new branch] gh/fffrog/142/head -> origin/gh/fffrog/142/head 2025-09-07T06:39:17.5665581Z * [new branch] gh/fffrog/142/orig -> origin/gh/fffrog/142/orig 2025-09-07T06:39:17.5665680Z * [new branch] gh/fffrog/143/base -> origin/gh/fffrog/143/base 2025-09-07T06:39:17.5665749Z * [new branch] gh/fffrog/143/head -> origin/gh/fffrog/143/head 2025-09-07T06:39:17.5665824Z * [new branch] gh/fffrog/143/orig -> origin/gh/fffrog/143/orig 2025-09-07T06:39:17.5665893Z * [new branch] gh/fffrog/144/base -> origin/gh/fffrog/144/base 2025-09-07T06:39:17.5665964Z * [new branch] gh/fffrog/144/head -> origin/gh/fffrog/144/head 2025-09-07T06:39:17.5666034Z * [new branch] gh/fffrog/144/orig -> origin/gh/fffrog/144/orig 2025-09-07T06:39:17.5666105Z * [new branch] gh/fffrog/145/base -> origin/gh/fffrog/145/base 2025-09-07T06:39:17.5666176Z * [new branch] gh/fffrog/145/head -> origin/gh/fffrog/145/head 2025-09-07T06:39:17.5666247Z * [new branch] gh/fffrog/145/orig -> origin/gh/fffrog/145/orig 2025-09-07T06:39:17.5666320Z * [new branch] gh/fffrog/146/base -> origin/gh/fffrog/146/base 2025-09-07T06:39:17.5666390Z * [new branch] gh/fffrog/146/head -> origin/gh/fffrog/146/head 2025-09-07T06:39:17.5666461Z * [new branch] gh/fffrog/146/orig -> origin/gh/fffrog/146/orig 2025-09-07T06:39:17.5666644Z * [new branch] gh/fffrog/147/base -> origin/gh/fffrog/147/base 2025-09-07T06:39:17.5666716Z * [new branch] gh/fffrog/147/head -> origin/gh/fffrog/147/head 2025-09-07T06:39:17.5666786Z * [new branch] gh/fffrog/147/orig -> origin/gh/fffrog/147/orig 2025-09-07T06:39:17.5666857Z * [new branch] gh/fffrog/148/base -> origin/gh/fffrog/148/base 2025-09-07T06:39:17.5666928Z * [new branch] gh/fffrog/148/head -> origin/gh/fffrog/148/head 2025-09-07T06:39:17.5666997Z * [new branch] gh/fffrog/148/orig -> origin/gh/fffrog/148/orig 2025-09-07T06:39:17.5667070Z * [new branch] gh/fffrog/149/base -> origin/gh/fffrog/149/base 2025-09-07T06:39:17.5667140Z * [new branch] gh/fffrog/149/head -> origin/gh/fffrog/149/head 2025-09-07T06:39:17.5667212Z * [new branch] gh/fffrog/149/orig -> origin/gh/fffrog/149/orig 2025-09-07T06:39:17.5667284Z * [new branch] gh/fffrog/150/base -> origin/gh/fffrog/150/base 2025-09-07T06:39:17.5667354Z * [new branch] gh/fffrog/150/head -> origin/gh/fffrog/150/head 2025-09-07T06:39:17.5667426Z * [new branch] gh/fffrog/150/orig -> origin/gh/fffrog/150/orig 2025-09-07T06:39:17.5667497Z * [new branch] gh/fffrog/151/base -> origin/gh/fffrog/151/base 2025-09-07T06:39:17.5667566Z * [new branch] gh/fffrog/151/head -> origin/gh/fffrog/151/head 2025-09-07T06:39:17.5667635Z * [new branch] gh/fffrog/151/orig -> origin/gh/fffrog/151/orig 2025-09-07T06:39:17.5669124Z * [new branch] gh/fffrog/152/base -> origin/gh/fffrog/152/base 2025-09-07T06:39:17.5669206Z * [new branch] gh/fffrog/152/head -> origin/gh/fffrog/152/head 2025-09-07T06:39:17.5669327Z * [new branch] gh/fffrog/153/base -> origin/gh/fffrog/153/base 2025-09-07T06:39:17.5669406Z * [new branch] gh/fffrog/153/head -> origin/gh/fffrog/153/head 2025-09-07T06:39:17.5669477Z * [new branch] gh/fffrog/153/orig -> origin/gh/fffrog/153/orig 2025-09-07T06:39:17.5669555Z * [new branch] gh/gmagogsfm/1/base -> origin/gh/gmagogsfm/1/base 2025-09-07T06:39:17.5669633Z * [new branch] gh/gmagogsfm/1/head -> origin/gh/gmagogsfm/1/head 2025-09-07T06:39:17.5669706Z * [new branch] gh/gmagogsfm/1/orig -> origin/gh/gmagogsfm/1/orig 2025-09-07T06:39:17.5669822Z * [new branch] gh/gmagogsfm/2/base -> origin/gh/gmagogsfm/2/base 2025-09-07T06:39:17.5669898Z * [new branch] gh/gmagogsfm/2/head -> origin/gh/gmagogsfm/2/head 2025-09-07T06:39:17.5669972Z * [new branch] gh/gmagogsfm/2/orig -> origin/gh/gmagogsfm/2/orig 2025-09-07T06:39:17.5670046Z * [new branch] gh/gmagogsfm/3/base -> origin/gh/gmagogsfm/3/base 2025-09-07T06:39:17.5670121Z * [new branch] gh/gmagogsfm/3/head -> origin/gh/gmagogsfm/3/head 2025-09-07T06:39:17.5670197Z * [new branch] gh/gmagogsfm/3/orig -> origin/gh/gmagogsfm/3/orig 2025-09-07T06:39:17.5670276Z * [new branch] gh/guangyey/134/base -> origin/gh/guangyey/134/base 2025-09-07T06:39:17.5670353Z * [new branch] gh/guangyey/134/head -> origin/gh/guangyey/134/head 2025-09-07T06:39:17.5670428Z * [new branch] gh/guangyey/134/orig -> origin/gh/guangyey/134/orig 2025-09-07T06:39:17.5670505Z * [new branch] gh/guangyey/135/base -> origin/gh/guangyey/135/base 2025-09-07T06:39:17.5670580Z * [new branch] gh/guangyey/135/head -> origin/gh/guangyey/135/head 2025-09-07T06:39:17.5670655Z * [new branch] gh/guangyey/135/orig -> origin/gh/guangyey/135/orig 2025-09-07T06:39:17.5670730Z * [new branch] gh/guangyey/139/base -> origin/gh/guangyey/139/base 2025-09-07T06:39:17.5670804Z * [new branch] gh/guangyey/139/head -> origin/gh/guangyey/139/head 2025-09-07T06:39:17.5670879Z * [new branch] gh/guangyey/139/orig -> origin/gh/guangyey/139/orig 2025-09-07T06:39:17.5672278Z * [new branch] gh/guangyey/140/base -> origin/gh/guangyey/140/base 2025-09-07T06:39:17.5672369Z * [new branch] gh/guangyey/140/head -> origin/gh/guangyey/140/head 2025-09-07T06:39:17.5672448Z * [new branch] gh/guangyey/140/orig -> origin/gh/guangyey/140/orig 2025-09-07T06:39:17.5672523Z * [new branch] gh/guangyey/142/base -> origin/gh/guangyey/142/base 2025-09-07T06:39:17.5672598Z * [new branch] gh/guangyey/142/head -> origin/gh/guangyey/142/head 2025-09-07T06:39:17.5672677Z * [new branch] gh/guangyey/142/orig -> origin/gh/guangyey/142/orig 2025-09-07T06:39:17.5672752Z * [new branch] gh/guangyey/145/base -> origin/gh/guangyey/145/base 2025-09-07T06:39:17.5672827Z * [new branch] gh/guangyey/145/head -> origin/gh/guangyey/145/head 2025-09-07T06:39:17.5672901Z * [new branch] gh/guangyey/145/orig -> origin/gh/guangyey/145/orig 2025-09-07T06:39:17.5672974Z * [new branch] gh/guangyey/153/base -> origin/gh/guangyey/153/base 2025-09-07T06:39:17.5673049Z * [new branch] gh/guangyey/153/head -> origin/gh/guangyey/153/head 2025-09-07T06:39:17.5673122Z * [new branch] gh/guangyey/153/orig -> origin/gh/guangyey/153/orig 2025-09-07T06:39:17.5673202Z * [new branch] gh/guangyey/159/base -> origin/gh/guangyey/159/base 2025-09-07T06:39:17.5673275Z * [new branch] gh/guangyey/159/head -> origin/gh/guangyey/159/head 2025-09-07T06:39:17.5673390Z * [new branch] gh/guangyey/159/orig -> origin/gh/guangyey/159/orig 2025-09-07T06:39:17.5673467Z * [new branch] gh/guangyey/163/base -> origin/gh/guangyey/163/base 2025-09-07T06:39:17.5673541Z * [new branch] gh/guangyey/163/head -> origin/gh/guangyey/163/head 2025-09-07T06:39:17.5673615Z * [new branch] gh/guangyey/163/orig -> origin/gh/guangyey/163/orig 2025-09-07T06:39:17.5673688Z * [new branch] gh/guangyey/168/base -> origin/gh/guangyey/168/base 2025-09-07T06:39:17.5673762Z * [new branch] gh/guangyey/168/head -> origin/gh/guangyey/168/head 2025-09-07T06:39:17.5673836Z * [new branch] gh/guangyey/168/orig -> origin/gh/guangyey/168/orig 2025-09-07T06:39:17.5673958Z * [new branch] gh/guangyey/169/base -> origin/gh/guangyey/169/base 2025-09-07T06:39:17.5674034Z * [new branch] gh/guangyey/169/head -> origin/gh/guangyey/169/head 2025-09-07T06:39:17.5674110Z * [new branch] gh/guangyey/169/orig -> origin/gh/guangyey/169/orig 2025-09-07T06:39:17.5674187Z * [new branch] gh/guangyey/170/base -> origin/gh/guangyey/170/base 2025-09-07T06:39:17.5674261Z * [new branch] gh/guangyey/170/head -> origin/gh/guangyey/170/head 2025-09-07T06:39:17.5674335Z * [new branch] gh/guangyey/170/orig -> origin/gh/guangyey/170/orig 2025-09-07T06:39:17.5674409Z * [new branch] gh/guangyey/171/base -> origin/gh/guangyey/171/base 2025-09-07T06:39:17.5674482Z * [new branch] gh/guangyey/171/head -> origin/gh/guangyey/171/head 2025-09-07T06:39:17.5674560Z * [new branch] gh/guangyey/171/orig -> origin/gh/guangyey/171/orig 2025-09-07T06:39:17.5674638Z * [new branch] gh/guangyey/174/base -> origin/gh/guangyey/174/base 2025-09-07T06:39:17.5674712Z * [new branch] gh/guangyey/174/head -> origin/gh/guangyey/174/head 2025-09-07T06:39:17.5674788Z * [new branch] gh/guangyey/174/orig -> origin/gh/guangyey/174/orig 2025-09-07T06:39:17.5674863Z * [new branch] gh/guangyey/176/base -> origin/gh/guangyey/176/base 2025-09-07T06:39:17.5676258Z * [new branch] gh/guangyey/176/head -> origin/gh/guangyey/176/head 2025-09-07T06:39:17.5676344Z * [new branch] gh/guangyey/176/orig -> origin/gh/guangyey/176/orig 2025-09-07T06:39:17.5676418Z * [new branch] gh/guangyey/178/base -> origin/gh/guangyey/178/base 2025-09-07T06:39:17.5676581Z * [new branch] gh/guangyey/178/head -> origin/gh/guangyey/178/head 2025-09-07T06:39:17.5676657Z * [new branch] gh/guangyey/178/orig -> origin/gh/guangyey/178/orig 2025-09-07T06:39:17.5676737Z * [new branch] gh/guangyey/181/base -> origin/gh/guangyey/181/base 2025-09-07T06:39:17.5676816Z * [new branch] gh/guangyey/181/head -> origin/gh/guangyey/181/head 2025-09-07T06:39:17.5676891Z * [new branch] gh/guangyey/181/orig -> origin/gh/guangyey/181/orig 2025-09-07T06:39:17.5676965Z * [new branch] gh/guangyey/182/base -> origin/gh/guangyey/182/base 2025-09-07T06:39:17.5677039Z * [new branch] gh/guangyey/182/head -> origin/gh/guangyey/182/head 2025-09-07T06:39:17.5677117Z * [new branch] gh/guangyey/182/orig -> origin/gh/guangyey/182/orig 2025-09-07T06:39:17.5677190Z * [new branch] gh/guangyey/183/base -> origin/gh/guangyey/183/base 2025-09-07T06:39:17.5677266Z * [new branch] gh/guangyey/183/head -> origin/gh/guangyey/183/head 2025-09-07T06:39:17.5677344Z * [new branch] gh/guangyey/183/orig -> origin/gh/guangyey/183/orig 2025-09-07T06:39:17.5677419Z * [new branch] gh/guangyey/184/base -> origin/gh/guangyey/184/base 2025-09-07T06:39:17.5677550Z * [new branch] gh/guangyey/184/head -> origin/gh/guangyey/184/head 2025-09-07T06:39:17.5677625Z * [new branch] gh/guangyey/184/orig -> origin/gh/guangyey/184/orig 2025-09-07T06:39:17.5677699Z * [new branch] gh/guangyey/185/base -> origin/gh/guangyey/185/base 2025-09-07T06:39:17.5677772Z * [new branch] gh/guangyey/185/head -> origin/gh/guangyey/185/head 2025-09-07T06:39:17.5677847Z * [new branch] gh/guangyey/185/orig -> origin/gh/guangyey/185/orig 2025-09-07T06:39:17.5677920Z * [new branch] gh/guangyey/186/base -> origin/gh/guangyey/186/base 2025-09-07T06:39:17.5678073Z * [new branch] gh/guangyey/186/head -> origin/gh/guangyey/186/head 2025-09-07T06:39:17.5678178Z * [new branch] gh/guangyey/186/orig -> origin/gh/guangyey/186/orig 2025-09-07T06:39:17.5679550Z * [new branch] gh/guangyey/187/base -> origin/gh/guangyey/187/base 2025-09-07T06:39:17.5679644Z * [new branch] gh/guangyey/187/head -> origin/gh/guangyey/187/head 2025-09-07T06:39:17.5679718Z * [new branch] gh/guangyey/187/orig -> origin/gh/guangyey/187/orig 2025-09-07T06:39:17.5679799Z * [new branch] gh/guangyey/188/base -> origin/gh/guangyey/188/base 2025-09-07T06:39:17.5679876Z * [new branch] gh/guangyey/188/head -> origin/gh/guangyey/188/head 2025-09-07T06:39:17.5679953Z * [new branch] gh/guangyey/188/orig -> origin/gh/guangyey/188/orig 2025-09-07T06:39:17.5680029Z * [new branch] gh/guangyey/189/base -> origin/gh/guangyey/189/base 2025-09-07T06:39:17.5680109Z * [new branch] gh/guangyey/189/head -> origin/gh/guangyey/189/head 2025-09-07T06:39:17.5680183Z * [new branch] gh/guangyey/189/orig -> origin/gh/guangyey/189/orig 2025-09-07T06:39:17.5680257Z * [new branch] gh/guangyey/190/base -> origin/gh/guangyey/190/base 2025-09-07T06:39:17.5680331Z * [new branch] gh/guangyey/190/head -> origin/gh/guangyey/190/head 2025-09-07T06:39:17.5680405Z * [new branch] gh/guangyey/190/orig -> origin/gh/guangyey/190/orig 2025-09-07T06:39:17.5680478Z * [new branch] gh/guangyey/191/base -> origin/gh/guangyey/191/base 2025-09-07T06:39:17.5680551Z * [new branch] gh/guangyey/191/head -> origin/gh/guangyey/191/head 2025-09-07T06:39:17.5680627Z * [new branch] gh/guangyey/191/orig -> origin/gh/guangyey/191/orig 2025-09-07T06:39:17.5680700Z * [new branch] gh/guangyey/192/base -> origin/gh/guangyey/192/base 2025-09-07T06:39:17.5680775Z * [new branch] gh/guangyey/192/head -> origin/gh/guangyey/192/head 2025-09-07T06:39:17.5680850Z * [new branch] gh/guangyey/192/orig -> origin/gh/guangyey/192/orig 2025-09-07T06:39:17.5680926Z * [new branch] gh/guangyey/193/base -> origin/gh/guangyey/193/base 2025-09-07T06:39:17.5680999Z * [new branch] gh/guangyey/193/head -> origin/gh/guangyey/193/head 2025-09-07T06:39:17.5681072Z * [new branch] gh/guangyey/193/orig -> origin/gh/guangyey/193/orig 2025-09-07T06:39:17.5681153Z * [new branch] gh/guangyey/194/base -> origin/gh/guangyey/194/base 2025-09-07T06:39:17.5681231Z * [new branch] gh/guangyey/194/head -> origin/gh/guangyey/194/head 2025-09-07T06:39:17.5681313Z * [new branch] gh/guangyey/194/orig -> origin/gh/guangyey/194/orig 2025-09-07T06:39:17.5681388Z * [new branch] gh/guangyey/195/base -> origin/gh/guangyey/195/base 2025-09-07T06:39:17.5681463Z * [new branch] gh/guangyey/195/head -> origin/gh/guangyey/195/head 2025-09-07T06:39:17.5681541Z * [new branch] gh/guangyey/195/orig -> origin/gh/guangyey/195/orig 2025-09-07T06:39:17.5681651Z * [new branch] gh/guangyey/196/base -> origin/gh/guangyey/196/base 2025-09-07T06:39:17.5681727Z * [new branch] gh/guangyey/196/head -> origin/gh/guangyey/196/head 2025-09-07T06:39:17.5681801Z * [new branch] gh/guangyey/196/orig -> origin/gh/guangyey/196/orig 2025-09-07T06:39:17.5681875Z * [new branch] gh/guangyey/197/base -> origin/gh/guangyey/197/base 2025-09-07T06:39:17.5681949Z * [new branch] gh/guangyey/197/head -> origin/gh/guangyey/197/head 2025-09-07T06:39:17.5682022Z * [new branch] gh/guangyey/197/orig -> origin/gh/guangyey/197/orig 2025-09-07T06:39:17.5682099Z * [new branch] gh/guangyey/198/base -> origin/gh/guangyey/198/base 2025-09-07T06:39:17.5682197Z * [new branch] gh/guangyey/198/head -> origin/gh/guangyey/198/head 2025-09-07T06:39:17.5682270Z * [new branch] gh/guangyey/198/orig -> origin/gh/guangyey/198/orig 2025-09-07T06:39:17.5682346Z * [new branch] gh/guangyey/199/base -> origin/gh/guangyey/199/base 2025-09-07T06:39:17.5682420Z * [new branch] gh/guangyey/199/head -> origin/gh/guangyey/199/head 2025-09-07T06:39:17.5682492Z * [new branch] gh/guangyey/199/orig -> origin/gh/guangyey/199/orig 2025-09-07T06:39:17.5683971Z * [new branch] gh/guangyey/200/base -> origin/gh/guangyey/200/base 2025-09-07T06:39:17.5684059Z * [new branch] gh/guangyey/200/head -> origin/gh/guangyey/200/head 2025-09-07T06:39:17.5684132Z * [new branch] gh/guangyey/200/orig -> origin/gh/guangyey/200/orig 2025-09-07T06:39:17.5684215Z * [new branch] gh/guangyey/201/base -> origin/gh/guangyey/201/base 2025-09-07T06:39:17.5684293Z * [new branch] gh/guangyey/201/head -> origin/gh/guangyey/201/head 2025-09-07T06:39:17.5684370Z * [new branch] gh/guangyey/201/orig -> origin/gh/guangyey/201/orig 2025-09-07T06:39:17.5684450Z * [new branch] gh/guangyey/202/base -> origin/gh/guangyey/202/base 2025-09-07T06:39:17.5684529Z * [new branch] gh/guangyey/202/head -> origin/gh/guangyey/202/head 2025-09-07T06:39:17.5684749Z * [new branch] gh/guangyey/202/orig -> origin/gh/guangyey/202/orig 2025-09-07T06:39:17.5684965Z * [new branch] gh/guangyey/203/base -> origin/gh/guangyey/203/base 2025-09-07T06:39:17.5685153Z * [new branch] gh/guangyey/203/head -> origin/gh/guangyey/203/head 2025-09-07T06:39:17.5685346Z * [new branch] gh/guangyey/203/orig -> origin/gh/guangyey/203/orig 2025-09-07T06:39:17.5685541Z * [new branch] gh/guangyey/204/base -> origin/gh/guangyey/204/base 2025-09-07T06:39:17.5685731Z * [new branch] gh/guangyey/204/head -> origin/gh/guangyey/204/head 2025-09-07T06:39:17.5685918Z * [new branch] gh/guangyey/204/orig -> origin/gh/guangyey/204/orig 2025-09-07T06:39:17.5686108Z * [new branch] gh/guangyey/205/base -> origin/gh/guangyey/205/base 2025-09-07T06:39:17.5688687Z * [new branch] gh/guangyey/205/head -> origin/gh/guangyey/205/head 2025-09-07T06:39:17.5688879Z * [new branch] gh/guangyey/205/orig -> origin/gh/guangyey/205/orig 2025-09-07T06:39:17.5689064Z * [new branch] gh/guangyey/206/base -> origin/gh/guangyey/206/base 2025-09-07T06:39:17.5689252Z * [new branch] gh/guangyey/206/head -> origin/gh/guangyey/206/head 2025-09-07T06:39:17.5689441Z * [new branch] gh/guangyey/206/orig -> origin/gh/guangyey/206/orig 2025-09-07T06:39:17.5689632Z * [new branch] gh/guangyey/207/base -> origin/gh/guangyey/207/base 2025-09-07T06:39:17.5689815Z * [new branch] gh/guangyey/207/head -> origin/gh/guangyey/207/head 2025-09-07T06:39:17.5690056Z * [new branch] gh/guangyey/207/orig -> origin/gh/guangyey/207/orig 2025-09-07T06:39:17.5690251Z * [new branch] gh/guangyey/79/base -> origin/gh/guangyey/79/base 2025-09-07T06:39:17.5690436Z * [new branch] gh/guangyey/79/head -> origin/gh/guangyey/79/head 2025-09-07T06:39:17.5690617Z * [new branch] gh/guangyey/79/orig -> origin/gh/guangyey/79/orig 2025-09-07T06:39:17.5690803Z * [new branch] gh/guangyey/89/base -> origin/gh/guangyey/89/base 2025-09-07T06:39:17.5690997Z * [new branch] gh/guangyey/89/head -> origin/gh/guangyey/89/head 2025-09-07T06:39:17.5691187Z * [new branch] gh/guangyey/89/orig -> origin/gh/guangyey/89/orig 2025-09-07T06:39:17.5691429Z * [new branch] gh/guilhermeleobas/107/base -> origin/gh/guilhermeleobas/107/base 2025-09-07T06:39:17.5691652Z * [new branch] gh/guilhermeleobas/107/head -> origin/gh/guilhermeleobas/107/head 2025-09-07T06:39:17.5691875Z * [new branch] gh/guilhermeleobas/107/orig -> origin/gh/guilhermeleobas/107/orig 2025-09-07T06:39:17.5692550Z * [new branch] gh/guilhermeleobas/108/base -> origin/gh/guilhermeleobas/108/base 2025-09-07T06:39:17.5692771Z * [new branch] gh/guilhermeleobas/108/head -> origin/gh/guilhermeleobas/108/head 2025-09-07T06:39:17.5692987Z * [new branch] gh/guilhermeleobas/108/orig -> origin/gh/guilhermeleobas/108/orig 2025-09-07T06:39:17.5693239Z * [new branch] gh/guilhermeleobas/124/base -> origin/gh/guilhermeleobas/124/base 2025-09-07T06:39:17.5693527Z * [new branch] gh/guilhermeleobas/124/head -> origin/gh/guilhermeleobas/124/head 2025-09-07T06:39:17.5693861Z * [new branch] gh/guilhermeleobas/124/orig -> origin/gh/guilhermeleobas/124/orig 2025-09-07T06:39:17.5694162Z * [new branch] gh/guilhermeleobas/147/base -> origin/gh/guilhermeleobas/147/base 2025-09-07T06:39:17.5694410Z * [new branch] gh/guilhermeleobas/147/head -> origin/gh/guilhermeleobas/147/head 2025-09-07T06:39:17.5694652Z * [new branch] gh/guilhermeleobas/147/orig -> origin/gh/guilhermeleobas/147/orig 2025-09-07T06:39:17.5694956Z * [new branch] gh/guilhermeleobas/150/base -> origin/gh/guilhermeleobas/150/base 2025-09-07T06:39:17.5695194Z * [new branch] gh/guilhermeleobas/150/head -> origin/gh/guilhermeleobas/150/head 2025-09-07T06:39:17.5695463Z * [new branch] gh/guilhermeleobas/150/orig -> origin/gh/guilhermeleobas/150/orig 2025-09-07T06:39:17.5695786Z * [new branch] gh/guilhermeleobas/163/base -> origin/gh/guilhermeleobas/163/base 2025-09-07T06:39:17.5696025Z * [new branch] gh/guilhermeleobas/163/head -> origin/gh/guilhermeleobas/163/head 2025-09-07T06:39:17.5696288Z * [new branch] gh/guilhermeleobas/163/orig -> origin/gh/guilhermeleobas/163/orig 2025-09-07T06:39:17.5696602Z * [new branch] gh/guilhermeleobas/164/base -> origin/gh/guilhermeleobas/164/base 2025-09-07T06:39:17.5696857Z * [new branch] gh/guilhermeleobas/164/head -> origin/gh/guilhermeleobas/164/head 2025-09-07T06:39:17.5697113Z * [new branch] gh/guilhermeleobas/164/orig -> origin/gh/guilhermeleobas/164/orig 2025-09-07T06:39:17.5697357Z * [new branch] gh/guilhermeleobas/165/base -> origin/gh/guilhermeleobas/165/base 2025-09-07T06:39:17.5697608Z * [new branch] gh/guilhermeleobas/165/head -> origin/gh/guilhermeleobas/165/head 2025-09-07T06:39:17.5697862Z * [new branch] gh/guilhermeleobas/165/orig -> origin/gh/guilhermeleobas/165/orig 2025-09-07T06:39:17.5698104Z * [new branch] gh/guilhermeleobas/166/base -> origin/gh/guilhermeleobas/166/base 2025-09-07T06:39:17.5698372Z * [new branch] gh/guilhermeleobas/166/head -> origin/gh/guilhermeleobas/166/head 2025-09-07T06:39:17.5698624Z * [new branch] gh/guilhermeleobas/166/orig -> origin/gh/guilhermeleobas/166/orig 2025-09-07T06:39:17.5698925Z * [new branch] gh/guilhermeleobas/167/base -> origin/gh/guilhermeleobas/167/base 2025-09-07T06:39:17.5699180Z * [new branch] gh/guilhermeleobas/167/head -> origin/gh/guilhermeleobas/167/head 2025-09-07T06:39:17.5699424Z * [new branch] gh/guilhermeleobas/167/orig -> origin/gh/guilhermeleobas/167/orig 2025-09-07T06:39:17.5699681Z * [new branch] gh/guilhermeleobas/168/base -> origin/gh/guilhermeleobas/168/base 2025-09-07T06:39:17.5699912Z * [new branch] gh/guilhermeleobas/168/head -> origin/gh/guilhermeleobas/168/head 2025-09-07T06:39:17.5700169Z * [new branch] gh/guilhermeleobas/168/orig -> origin/gh/guilhermeleobas/168/orig 2025-09-07T06:39:17.5700451Z * [new branch] gh/guilhermeleobas/169/base -> origin/gh/guilhermeleobas/169/base 2025-09-07T06:39:17.5702714Z * [new branch] gh/guilhermeleobas/169/head -> origin/gh/guilhermeleobas/169/head 2025-09-07T06:39:17.5702946Z * [new branch] gh/guilhermeleobas/169/orig -> origin/gh/guilhermeleobas/169/orig 2025-09-07T06:39:17.5703164Z * [new branch] gh/guilhermeleobas/170/base -> origin/gh/guilhermeleobas/170/base 2025-09-07T06:39:17.5703375Z * [new branch] gh/guilhermeleobas/170/head -> origin/gh/guilhermeleobas/170/head 2025-09-07T06:39:17.5703587Z * [new branch] gh/guilhermeleobas/170/orig -> origin/gh/guilhermeleobas/170/orig 2025-09-07T06:39:17.5703795Z * [new branch] gh/guilhermeleobas/171/base -> origin/gh/guilhermeleobas/171/base 2025-09-07T06:39:17.5704004Z * [new branch] gh/guilhermeleobas/171/head -> origin/gh/guilhermeleobas/171/head 2025-09-07T06:39:17.5704217Z * [new branch] gh/guilhermeleobas/171/orig -> origin/gh/guilhermeleobas/171/orig 2025-09-07T06:39:17.5704429Z * [new branch] gh/guilhermeleobas/173/base -> origin/gh/guilhermeleobas/173/base 2025-09-07T06:39:17.5704641Z * [new branch] gh/guilhermeleobas/173/head -> origin/gh/guilhermeleobas/173/head 2025-09-07T06:39:17.5704851Z * [new branch] gh/guilhermeleobas/173/orig -> origin/gh/guilhermeleobas/173/orig 2025-09-07T06:39:17.5705062Z * [new branch] gh/guilhermeleobas/192/base -> origin/gh/guilhermeleobas/192/base 2025-09-07T06:39:17.5705274Z * [new branch] gh/guilhermeleobas/192/head -> origin/gh/guilhermeleobas/192/head 2025-09-07T06:39:17.5705483Z * [new branch] gh/guilhermeleobas/192/orig -> origin/gh/guilhermeleobas/192/orig 2025-09-07T06:39:17.5705697Z * [new branch] gh/guilhermeleobas/193/base -> origin/gh/guilhermeleobas/193/base 2025-09-07T06:39:17.5705909Z * [new branch] gh/guilhermeleobas/193/head -> origin/gh/guilhermeleobas/193/head 2025-09-07T06:39:17.5706121Z * [new branch] gh/guilhermeleobas/193/orig -> origin/gh/guilhermeleobas/193/orig 2025-09-07T06:39:17.5706330Z * [new branch] gh/guilhermeleobas/194/base -> origin/gh/guilhermeleobas/194/base 2025-09-07T06:39:17.5706616Z * [new branch] gh/guilhermeleobas/194/head -> origin/gh/guilhermeleobas/194/head 2025-09-07T06:39:17.5706828Z * [new branch] gh/guilhermeleobas/194/orig -> origin/gh/guilhermeleobas/194/orig 2025-09-07T06:39:17.5707038Z * [new branch] gh/guilhermeleobas/203/base -> origin/gh/guilhermeleobas/203/base 2025-09-07T06:39:17.5707254Z * [new branch] gh/guilhermeleobas/203/head -> origin/gh/guilhermeleobas/203/head 2025-09-07T06:39:17.5707466Z * [new branch] gh/guilhermeleobas/203/orig -> origin/gh/guilhermeleobas/203/orig 2025-09-07T06:39:17.5707684Z * [new branch] gh/guilhermeleobas/204/base -> origin/gh/guilhermeleobas/204/base 2025-09-07T06:39:17.5707893Z * [new branch] gh/guilhermeleobas/204/head -> origin/gh/guilhermeleobas/204/head 2025-09-07T06:39:17.5708107Z * [new branch] gh/guilhermeleobas/204/orig -> origin/gh/guilhermeleobas/204/orig 2025-09-07T06:39:17.5708377Z * [new branch] gh/guilhermeleobas/205/base -> origin/gh/guilhermeleobas/205/base 2025-09-07T06:39:17.5708590Z * [new branch] gh/guilhermeleobas/205/head -> origin/gh/guilhermeleobas/205/head 2025-09-07T06:39:17.5708807Z * [new branch] gh/guilhermeleobas/205/orig -> origin/gh/guilhermeleobas/205/orig 2025-09-07T06:39:17.5709021Z * [new branch] gh/guilhermeleobas/209/base -> origin/gh/guilhermeleobas/209/base 2025-09-07T06:39:17.5709230Z * [new branch] gh/guilhermeleobas/209/head -> origin/gh/guilhermeleobas/209/head 2025-09-07T06:39:17.5709444Z * [new branch] gh/guilhermeleobas/209/orig -> origin/gh/guilhermeleobas/209/orig 2025-09-07T06:39:17.5709707Z * [new branch] gh/guilhermeleobas/210/base -> origin/gh/guilhermeleobas/210/base 2025-09-07T06:39:17.5709923Z * [new branch] gh/guilhermeleobas/210/head -> origin/gh/guilhermeleobas/210/head 2025-09-07T06:39:17.5710140Z * [new branch] gh/guilhermeleobas/210/orig -> origin/gh/guilhermeleobas/210/orig 2025-09-07T06:39:17.5710352Z * [new branch] gh/guilhermeleobas/211/base -> origin/gh/guilhermeleobas/211/base 2025-09-07T06:39:17.5710562Z * [new branch] gh/guilhermeleobas/211/head -> origin/gh/guilhermeleobas/211/head 2025-09-07T06:39:17.5710771Z * [new branch] gh/guilhermeleobas/211/orig -> origin/gh/guilhermeleobas/211/orig 2025-09-07T06:39:17.5710986Z * [new branch] gh/guilhermeleobas/214/base -> origin/gh/guilhermeleobas/214/base 2025-09-07T06:39:17.5711199Z * [new branch] gh/guilhermeleobas/214/head -> origin/gh/guilhermeleobas/214/head 2025-09-07T06:39:17.5711412Z * [new branch] gh/guilhermeleobas/214/orig -> origin/gh/guilhermeleobas/214/orig 2025-09-07T06:39:17.5711626Z * [new branch] gh/guilhermeleobas/215/base -> origin/gh/guilhermeleobas/215/base 2025-09-07T06:39:17.5711840Z * [new branch] gh/guilhermeleobas/215/head -> origin/gh/guilhermeleobas/215/head 2025-09-07T06:39:17.5712054Z * [new branch] gh/guilhermeleobas/215/orig -> origin/gh/guilhermeleobas/215/orig 2025-09-07T06:39:17.5712262Z * [new branch] gh/guilhermeleobas/216/base -> origin/gh/guilhermeleobas/216/base 2025-09-07T06:39:17.5712470Z * [new branch] gh/guilhermeleobas/216/head -> origin/gh/guilhermeleobas/216/head 2025-09-07T06:39:17.5712680Z * [new branch] gh/guilhermeleobas/216/orig -> origin/gh/guilhermeleobas/216/orig 2025-09-07T06:39:17.5712889Z * [new branch] gh/guilhermeleobas/217/base -> origin/gh/guilhermeleobas/217/base 2025-09-07T06:39:17.5713100Z * [new branch] gh/guilhermeleobas/217/head -> origin/gh/guilhermeleobas/217/head 2025-09-07T06:39:17.5713309Z * [new branch] gh/guilhermeleobas/217/orig -> origin/gh/guilhermeleobas/217/orig 2025-09-07T06:39:17.5713522Z * [new branch] gh/guilhermeleobas/219/base -> origin/gh/guilhermeleobas/219/base 2025-09-07T06:39:17.5713735Z * [new branch] gh/guilhermeleobas/219/head -> origin/gh/guilhermeleobas/219/head 2025-09-07T06:39:17.5713944Z * [new branch] gh/guilhermeleobas/219/orig -> origin/gh/guilhermeleobas/219/orig 2025-09-07T06:39:17.5714154Z * [new branch] gh/guilhermeleobas/220/base -> origin/gh/guilhermeleobas/220/base 2025-09-07T06:39:17.5714367Z * [new branch] gh/guilhermeleobas/220/head -> origin/gh/guilhermeleobas/220/head 2025-09-07T06:39:17.5714580Z * [new branch] gh/guilhermeleobas/220/orig -> origin/gh/guilhermeleobas/220/orig 2025-09-07T06:39:17.5714796Z * [new branch] gh/guilhermeleobas/221/base -> origin/gh/guilhermeleobas/221/base 2025-09-07T06:39:17.5715008Z * [new branch] gh/guilhermeleobas/221/head -> origin/gh/guilhermeleobas/221/head 2025-09-07T06:39:17.5715218Z * [new branch] gh/guilhermeleobas/221/orig -> origin/gh/guilhermeleobas/221/orig 2025-09-07T06:39:17.5715473Z * [new branch] gh/guilhermeleobas/222/base -> origin/gh/guilhermeleobas/222/base 2025-09-07T06:39:17.5715681Z * [new branch] gh/guilhermeleobas/222/head -> origin/gh/guilhermeleobas/222/head 2025-09-07T06:39:17.5715890Z * [new branch] gh/guilhermeleobas/222/orig -> origin/gh/guilhermeleobas/222/orig 2025-09-07T06:39:17.5716098Z * [new branch] gh/guilhermeleobas/223/base -> origin/gh/guilhermeleobas/223/base 2025-09-07T06:39:17.5716312Z * [new branch] gh/guilhermeleobas/223/head -> origin/gh/guilhermeleobas/223/head 2025-09-07T06:39:17.5716593Z * [new branch] gh/guilhermeleobas/223/orig -> origin/gh/guilhermeleobas/223/orig 2025-09-07T06:39:17.5716844Z * [new branch] gh/guilhermeleobas/224/base -> origin/gh/guilhermeleobas/224/base 2025-09-07T06:39:17.5717053Z * [new branch] gh/guilhermeleobas/224/head -> origin/gh/guilhermeleobas/224/head 2025-09-07T06:39:17.5717263Z * [new branch] gh/guilhermeleobas/224/orig -> origin/gh/guilhermeleobas/224/orig 2025-09-07T06:39:17.5717472Z * [new branch] gh/guilhermeleobas/225/base -> origin/gh/guilhermeleobas/225/base 2025-09-07T06:39:17.5717680Z * [new branch] gh/guilhermeleobas/225/head -> origin/gh/guilhermeleobas/225/head 2025-09-07T06:39:17.5717888Z * [new branch] gh/guilhermeleobas/225/orig -> origin/gh/guilhermeleobas/225/orig 2025-09-07T06:39:17.5718197Z * [new branch] gh/guilhermeleobas/226/base -> origin/gh/guilhermeleobas/226/base 2025-09-07T06:39:17.5718432Z * [new branch] gh/guilhermeleobas/226/head -> origin/gh/guilhermeleobas/226/head 2025-09-07T06:39:17.5718688Z * [new branch] gh/guilhermeleobas/226/orig -> origin/gh/guilhermeleobas/226/orig 2025-09-07T06:39:17.5718970Z * [new branch] gh/guilhermeleobas/227/base -> origin/gh/guilhermeleobas/227/base 2025-09-07T06:39:17.5719223Z * [new branch] gh/guilhermeleobas/227/head -> origin/gh/guilhermeleobas/227/head 2025-09-07T06:39:17.5719490Z * [new branch] gh/guilhermeleobas/227/orig -> origin/gh/guilhermeleobas/227/orig 2025-09-07T06:39:17.5719751Z * [new branch] gh/guilhermeleobas/228/base -> origin/gh/guilhermeleobas/228/base 2025-09-07T06:39:17.5719995Z * [new branch] gh/guilhermeleobas/228/head -> origin/gh/guilhermeleobas/228/head 2025-09-07T06:39:17.5720249Z * [new branch] gh/guilhermeleobas/228/orig -> origin/gh/guilhermeleobas/228/orig 2025-09-07T06:39:17.5720486Z * [new branch] gh/guilhermeleobas/229/base -> origin/gh/guilhermeleobas/229/base 2025-09-07T06:39:17.5720739Z * [new branch] gh/guilhermeleobas/229/head -> origin/gh/guilhermeleobas/229/head 2025-09-07T06:39:17.5721013Z * [new branch] gh/guilhermeleobas/229/orig -> origin/gh/guilhermeleobas/229/orig 2025-09-07T06:39:17.5721245Z * [new branch] gh/guilhermeleobas/230/base -> origin/gh/guilhermeleobas/230/base 2025-09-07T06:39:17.5721497Z * [new branch] gh/guilhermeleobas/230/head -> origin/gh/guilhermeleobas/230/head 2025-09-07T06:39:17.5721739Z * [new branch] gh/guilhermeleobas/230/orig -> origin/gh/guilhermeleobas/230/orig 2025-09-07T06:39:17.5721989Z * [new branch] gh/guilhermeleobas/231/base -> origin/gh/guilhermeleobas/231/base 2025-09-07T06:39:17.5722236Z * [new branch] gh/guilhermeleobas/231/head -> origin/gh/guilhermeleobas/231/head 2025-09-07T06:39:17.5722475Z * [new branch] gh/guilhermeleobas/231/orig -> origin/gh/guilhermeleobas/231/orig 2025-09-07T06:39:17.5722728Z * [new branch] gh/guilhermeleobas/232/base -> origin/gh/guilhermeleobas/232/base 2025-09-07T06:39:17.5722986Z * [new branch] gh/guilhermeleobas/232/head -> origin/gh/guilhermeleobas/232/head 2025-09-07T06:39:17.5723249Z * [new branch] gh/guilhermeleobas/232/orig -> origin/gh/guilhermeleobas/232/orig 2025-09-07T06:39:17.5723548Z * [new branch] gh/guilhermeleobas/233/base -> origin/gh/guilhermeleobas/233/base 2025-09-07T06:39:17.5723781Z * [new branch] gh/guilhermeleobas/233/head -> origin/gh/guilhermeleobas/233/head 2025-09-07T06:39:17.5724055Z * [new branch] gh/guilhermeleobas/233/orig -> origin/gh/guilhermeleobas/233/orig 2025-09-07T06:39:17.5724302Z * [new branch] gh/guilhermeleobas/234/base -> origin/gh/guilhermeleobas/234/base 2025-09-07T06:39:17.5724548Z * [new branch] gh/guilhermeleobas/234/head -> origin/gh/guilhermeleobas/234/head 2025-09-07T06:39:17.5724814Z * [new branch] gh/guilhermeleobas/234/orig -> origin/gh/guilhermeleobas/234/orig 2025-09-07T06:39:17.5725270Z * [new branch] gh/guilhermeleobas/235/base -> origin/gh/guilhermeleobas/235/base 2025-09-07T06:39:17.5725517Z * [new branch] gh/guilhermeleobas/235/head -> origin/gh/guilhermeleobas/235/head 2025-09-07T06:39:17.5725788Z * [new branch] gh/guilhermeleobas/235/orig -> origin/gh/guilhermeleobas/235/orig 2025-09-07T06:39:17.5726021Z * [new branch] gh/guilhermeleobas/236/base -> origin/gh/guilhermeleobas/236/base 2025-09-07T06:39:17.5726268Z * [new branch] gh/guilhermeleobas/236/head -> origin/gh/guilhermeleobas/236/head 2025-09-07T06:39:17.5726606Z * [new branch] gh/guilhermeleobas/236/orig -> origin/gh/guilhermeleobas/236/orig 2025-09-07T06:39:17.5726855Z * [new branch] gh/guilhermeleobas/237/base -> origin/gh/guilhermeleobas/237/base 2025-09-07T06:39:17.5727086Z * [new branch] gh/guilhermeleobas/237/head -> origin/gh/guilhermeleobas/237/head 2025-09-07T06:39:17.5727359Z * [new branch] gh/guilhermeleobas/237/orig -> origin/gh/guilhermeleobas/237/orig 2025-09-07T06:39:17.5727608Z * [new branch] gh/guilhermeleobas/238/base -> origin/gh/guilhermeleobas/238/base 2025-09-07T06:39:17.5727845Z * [new branch] gh/guilhermeleobas/238/head -> origin/gh/guilhermeleobas/238/head 2025-09-07T06:39:17.5728120Z * [new branch] gh/guilhermeleobas/238/orig -> origin/gh/guilhermeleobas/238/orig 2025-09-07T06:39:17.5728366Z * [new branch] gh/guilhermeleobas/239/base -> origin/gh/guilhermeleobas/239/base 2025-09-07T06:39:17.5728598Z * [new branch] gh/guilhermeleobas/239/head -> origin/gh/guilhermeleobas/239/head 2025-09-07T06:39:17.5728864Z * [new branch] gh/guilhermeleobas/239/orig -> origin/gh/guilhermeleobas/239/orig 2025-09-07T06:39:17.5729101Z * [new branch] gh/guilhermeleobas/240/base -> origin/gh/guilhermeleobas/240/base 2025-09-07T06:39:17.5729347Z * [new branch] gh/guilhermeleobas/240/head -> origin/gh/guilhermeleobas/240/head 2025-09-07T06:39:17.5729617Z * [new branch] gh/guilhermeleobas/240/orig -> origin/gh/guilhermeleobas/240/orig 2025-09-07T06:39:17.5729855Z * [new branch] gh/guilhermeleobas/241/base -> origin/gh/guilhermeleobas/241/base 2025-09-07T06:39:17.5730110Z * [new branch] gh/guilhermeleobas/241/head -> origin/gh/guilhermeleobas/241/head 2025-09-07T06:39:17.5730376Z * [new branch] gh/guilhermeleobas/241/orig -> origin/gh/guilhermeleobas/241/orig 2025-09-07T06:39:17.5730625Z * [new branch] gh/guilhermeleobas/242/base -> origin/gh/guilhermeleobas/242/base 2025-09-07T06:39:17.5730869Z * [new branch] gh/guilhermeleobas/242/head -> origin/gh/guilhermeleobas/242/head 2025-09-07T06:39:17.5731125Z * [new branch] gh/guilhermeleobas/242/orig -> origin/gh/guilhermeleobas/242/orig 2025-09-07T06:39:17.5731372Z * [new branch] gh/guilhermeleobas/243/base -> origin/gh/guilhermeleobas/243/base 2025-09-07T06:39:17.5731610Z * [new branch] gh/guilhermeleobas/243/head -> origin/gh/guilhermeleobas/243/head 2025-09-07T06:39:17.5731871Z * [new branch] gh/guilhermeleobas/243/orig -> origin/gh/guilhermeleobas/243/orig 2025-09-07T06:39:17.5732178Z * [new branch] gh/guilhermeleobas/244/base -> origin/gh/guilhermeleobas/244/base 2025-09-07T06:39:17.5732410Z * [new branch] gh/guilhermeleobas/244/head -> origin/gh/guilhermeleobas/244/head 2025-09-07T06:39:17.5732670Z * [new branch] gh/guilhermeleobas/244/orig -> origin/gh/guilhermeleobas/244/orig 2025-09-07T06:39:17.5732903Z * [new branch] gh/guilhermeleobas/245/base -> origin/gh/guilhermeleobas/245/base 2025-09-07T06:39:17.5733152Z * [new branch] gh/guilhermeleobas/245/head -> origin/gh/guilhermeleobas/245/head 2025-09-07T06:39:17.5733414Z * [new branch] gh/guilhermeleobas/245/orig -> origin/gh/guilhermeleobas/245/orig 2025-09-07T06:39:17.5733692Z * [new branch] gh/guilhermeleobas/73/base -> origin/gh/guilhermeleobas/73/base 2025-09-07T06:39:17.5733936Z * [new branch] gh/guilhermeleobas/73/head -> origin/gh/guilhermeleobas/73/head 2025-09-07T06:39:17.5734198Z * [new branch] gh/guilhermeleobas/73/orig -> origin/gh/guilhermeleobas/73/orig 2025-09-07T06:39:17.5734448Z * [new branch] gh/henrylhtsang/140/base -> origin/gh/henrylhtsang/140/base 2025-09-07T06:39:17.5734683Z * [new branch] gh/henrylhtsang/140/head -> origin/gh/henrylhtsang/140/head 2025-09-07T06:39:17.5734927Z * [new branch] gh/henrylhtsang/140/orig -> origin/gh/henrylhtsang/140/orig 2025-09-07T06:39:17.5737234Z * [new branch] gh/henrylhtsang/141/base -> origin/gh/henrylhtsang/141/base 2025-09-07T06:39:17.5737460Z * [new branch] gh/henrylhtsang/141/head -> origin/gh/henrylhtsang/141/head 2025-09-07T06:39:17.5737666Z * [new branch] gh/henrylhtsang/141/orig -> origin/gh/henrylhtsang/141/orig 2025-09-07T06:39:17.5737867Z * [new branch] gh/henrylhtsang/142/base -> origin/gh/henrylhtsang/142/base 2025-09-07T06:39:17.5738072Z * [new branch] gh/henrylhtsang/142/head -> origin/gh/henrylhtsang/142/head 2025-09-07T06:39:17.5738270Z * [new branch] gh/henrylhtsang/142/orig -> origin/gh/henrylhtsang/142/orig 2025-09-07T06:39:17.5738470Z * [new branch] gh/henrylhtsang/143/base -> origin/gh/henrylhtsang/143/base 2025-09-07T06:39:17.5738668Z * [new branch] gh/henrylhtsang/143/head -> origin/gh/henrylhtsang/143/head 2025-09-07T06:39:17.5738867Z * [new branch] gh/henrylhtsang/143/orig -> origin/gh/henrylhtsang/143/orig 2025-09-07T06:39:17.5739067Z * [new branch] gh/henrylhtsang/144/base -> origin/gh/henrylhtsang/144/base 2025-09-07T06:39:17.5739267Z * [new branch] gh/henrylhtsang/144/head -> origin/gh/henrylhtsang/144/head 2025-09-07T06:39:17.5739467Z * [new branch] gh/henrylhtsang/144/orig -> origin/gh/henrylhtsang/144/orig 2025-09-07T06:39:17.5739671Z * [new branch] gh/henrylhtsang/145/base -> origin/gh/henrylhtsang/145/base 2025-09-07T06:39:17.5739874Z * [new branch] gh/henrylhtsang/145/head -> origin/gh/henrylhtsang/145/head 2025-09-07T06:39:17.5740073Z * [new branch] gh/henrylhtsang/145/orig -> origin/gh/henrylhtsang/145/orig 2025-09-07T06:39:17.5740271Z * [new branch] gh/henrylhtsang/146/base -> origin/gh/henrylhtsang/146/base 2025-09-07T06:39:17.5740470Z * [new branch] gh/henrylhtsang/146/head -> origin/gh/henrylhtsang/146/head 2025-09-07T06:39:17.5740668Z * [new branch] gh/henrylhtsang/146/orig -> origin/gh/henrylhtsang/146/orig 2025-09-07T06:39:17.5740865Z * [new branch] gh/henrylhtsang/147/base -> origin/gh/henrylhtsang/147/base 2025-09-07T06:39:17.5741063Z * [new branch] gh/henrylhtsang/147/head -> origin/gh/henrylhtsang/147/head 2025-09-07T06:39:17.5741261Z * [new branch] gh/henrylhtsang/147/orig -> origin/gh/henrylhtsang/147/orig 2025-09-07T06:39:17.5741521Z * [new branch] gh/henrylhtsang/148/base -> origin/gh/henrylhtsang/148/base 2025-09-07T06:39:17.5741720Z * [new branch] gh/henrylhtsang/148/head -> origin/gh/henrylhtsang/148/head 2025-09-07T06:39:17.5741923Z * [new branch] gh/henrylhtsang/148/orig -> origin/gh/henrylhtsang/148/orig 2025-09-07T06:39:17.5742122Z * [new branch] gh/henrylhtsang/149/base -> origin/gh/henrylhtsang/149/base 2025-09-07T06:39:17.5742321Z * [new branch] gh/henrylhtsang/149/head -> origin/gh/henrylhtsang/149/head 2025-09-07T06:39:17.5742521Z * [new branch] gh/henrylhtsang/149/orig -> origin/gh/henrylhtsang/149/orig 2025-09-07T06:39:17.5742716Z * [new branch] gh/huydhn/1/next -> origin/gh/huydhn/1/next 2025-09-07T06:39:17.5742932Z * [new branch] gh/huydhn/2/next -> origin/gh/huydhn/2/next 2025-09-07T06:39:17.5743108Z * [new branch] gh/huydhn/3/next -> origin/gh/huydhn/3/next 2025-09-07T06:39:17.5743284Z * [new branch] gh/huydhn/4/next -> origin/gh/huydhn/4/next 2025-09-07T06:39:17.5743460Z * [new branch] gh/huydhn/5/next -> origin/gh/huydhn/5/next 2025-09-07T06:39:17.5743631Z * [new branch] gh/huydhn/6/next -> origin/gh/huydhn/6/next 2025-09-07T06:39:17.5743810Z * [new branch] gh/int3/97/base -> origin/gh/int3/97/base 2025-09-07T06:39:17.5743979Z * [new branch] gh/int3/97/head -> origin/gh/int3/97/head 2025-09-07T06:39:17.5744158Z * [new branch] gh/isuruf/101/base -> origin/gh/isuruf/101/base 2025-09-07T06:39:17.5744344Z * [new branch] gh/isuruf/101/head -> origin/gh/isuruf/101/head 2025-09-07T06:39:17.5744525Z * [new branch] gh/isuruf/141/base -> origin/gh/isuruf/141/base 2025-09-07T06:39:17.5744703Z * [new branch] gh/isuruf/141/head -> origin/gh/isuruf/141/head 2025-09-07T06:39:17.5744880Z * [new branch] gh/isuruf/141/orig -> origin/gh/isuruf/141/orig 2025-09-07T06:39:17.5745058Z * [new branch] gh/isuruf/142/base -> origin/gh/isuruf/142/base 2025-09-07T06:39:17.5745233Z * [new branch] gh/isuruf/142/head -> origin/gh/isuruf/142/head 2025-09-07T06:39:17.5745410Z * [new branch] gh/isuruf/142/orig -> origin/gh/isuruf/142/orig 2025-09-07T06:39:17.5745588Z * [new branch] gh/isuruf/143/base -> origin/gh/isuruf/143/base 2025-09-07T06:39:17.5745764Z * [new branch] gh/isuruf/143/head -> origin/gh/isuruf/143/head 2025-09-07T06:39:17.5745944Z * [new branch] gh/isuruf/143/orig -> origin/gh/isuruf/143/orig 2025-09-07T06:39:17.5746120Z * [new branch] gh/isuruf/144/base -> origin/gh/isuruf/144/base 2025-09-07T06:39:17.5746297Z * [new branch] gh/isuruf/144/head -> origin/gh/isuruf/144/head 2025-09-07T06:39:17.5746475Z * [new branch] gh/isuruf/144/orig -> origin/gh/isuruf/144/orig 2025-09-07T06:39:17.5748259Z * [new branch] gh/isuruf/145/base -> origin/gh/isuruf/145/base 2025-09-07T06:39:17.5748452Z * [new branch] gh/isuruf/145/head -> origin/gh/isuruf/145/head 2025-09-07T06:39:17.5748629Z * [new branch] gh/isuruf/145/orig -> origin/gh/isuruf/145/orig 2025-09-07T06:39:17.5748809Z * [new branch] gh/isuruf/146/base -> origin/gh/isuruf/146/base 2025-09-07T06:39:17.5748985Z * [new branch] gh/isuruf/146/head -> origin/gh/isuruf/146/head 2025-09-07T06:39:17.5749162Z * [new branch] gh/isuruf/146/orig -> origin/gh/isuruf/146/orig 2025-09-07T06:39:17.5749339Z * [new branch] gh/isuruf/81/base -> origin/gh/isuruf/81/base 2025-09-07T06:39:17.5749517Z * [new branch] gh/isuruf/81/head -> origin/gh/isuruf/81/head 2025-09-07T06:39:17.5749745Z * [new branch] gh/isuruf/81/orig -> origin/gh/isuruf/81/orig 2025-09-07T06:39:17.5749931Z * [new branch] gh/jamesjwu/150/base -> origin/gh/jamesjwu/150/base 2025-09-07T06:39:17.5750122Z * [new branch] gh/jamesjwu/150/head -> origin/gh/jamesjwu/150/head 2025-09-07T06:39:17.5750309Z * [new branch] gh/jamesjwu/150/orig -> origin/gh/jamesjwu/150/orig 2025-09-07T06:39:17.5751816Z * [new branch] gh/jamesjwu/154/base -> origin/gh/jamesjwu/154/base 2025-09-07T06:39:17.5752023Z * [new branch] gh/jamesjwu/154/head -> origin/gh/jamesjwu/154/head 2025-09-07T06:39:17.5752269Z * [new branch] gh/jamesjwu/154/orig -> origin/gh/jamesjwu/154/orig 2025-09-07T06:39:17.5752455Z * [new branch] gh/jamesjwu/155/base -> origin/gh/jamesjwu/155/base 2025-09-07T06:39:17.5752640Z * [new branch] gh/jamesjwu/155/head -> origin/gh/jamesjwu/155/head 2025-09-07T06:39:17.5752827Z * [new branch] gh/jamesjwu/155/orig -> origin/gh/jamesjwu/155/orig 2025-09-07T06:39:17.5753012Z * [new branch] gh/jamesjwu/159/base -> origin/gh/jamesjwu/159/base 2025-09-07T06:39:17.5753194Z * [new branch] gh/jamesjwu/159/head -> origin/gh/jamesjwu/159/head 2025-09-07T06:39:17.5753379Z * [new branch] gh/jamesjwu/159/orig -> origin/gh/jamesjwu/159/orig 2025-09-07T06:39:17.5753564Z * [new branch] gh/jamesjwu/163/base -> origin/gh/jamesjwu/163/base 2025-09-07T06:39:17.5753749Z * [new branch] gh/jamesjwu/163/head -> origin/gh/jamesjwu/163/head 2025-09-07T06:39:17.5753934Z * [new branch] gh/jamesjwu/163/orig -> origin/gh/jamesjwu/163/orig 2025-09-07T06:39:17.5755300Z * [new branch] gh/jamesjwu/171/base -> origin/gh/jamesjwu/171/base 2025-09-07T06:39:17.5755492Z * [new branch] gh/jamesjwu/171/head -> origin/gh/jamesjwu/171/head 2025-09-07T06:39:17.5755676Z * [new branch] gh/jamesjwu/171/orig -> origin/gh/jamesjwu/171/orig 2025-09-07T06:39:17.5755859Z * [new branch] gh/jamesjwu/176/base -> origin/gh/jamesjwu/176/base 2025-09-07T06:39:17.5756043Z * [new branch] gh/jamesjwu/176/head -> origin/gh/jamesjwu/176/head 2025-09-07T06:39:17.5756225Z * [new branch] gh/jamesjwu/176/orig -> origin/gh/jamesjwu/176/orig 2025-09-07T06:39:17.5756408Z * [new branch] gh/jamesjwu/181/base -> origin/gh/jamesjwu/181/base 2025-09-07T06:39:17.5756668Z * [new branch] gh/jamesjwu/181/head -> origin/gh/jamesjwu/181/head 2025-09-07T06:39:17.5756854Z * [new branch] gh/jamesjwu/181/orig -> origin/gh/jamesjwu/181/orig 2025-09-07T06:39:17.5757038Z * [new branch] gh/jamesjwu/182/base -> origin/gh/jamesjwu/182/base 2025-09-07T06:39:17.5757224Z * [new branch] gh/jamesjwu/182/head -> origin/gh/jamesjwu/182/head 2025-09-07T06:39:17.5757408Z * [new branch] gh/jamesjwu/182/orig -> origin/gh/jamesjwu/182/orig 2025-09-07T06:39:17.5757591Z * [new branch] gh/jamesjwu/183/base -> origin/gh/jamesjwu/183/base 2025-09-07T06:39:17.5758982Z * [new branch] gh/jamesjwu/183/head -> origin/gh/jamesjwu/183/head 2025-09-07T06:39:17.5759168Z * [new branch] gh/jamesjwu/183/orig -> origin/gh/jamesjwu/183/orig 2025-09-07T06:39:17.5759351Z * [new branch] gh/jamesjwu/184/base -> origin/gh/jamesjwu/184/base 2025-09-07T06:39:17.5759534Z * [new branch] gh/jamesjwu/184/head -> origin/gh/jamesjwu/184/head 2025-09-07T06:39:17.5759720Z * [new branch] gh/jamesjwu/184/orig -> origin/gh/jamesjwu/184/orig 2025-09-07T06:39:17.5759904Z * [new branch] gh/jamesjwu/185/base -> origin/gh/jamesjwu/185/base 2025-09-07T06:39:17.5760149Z * [new branch] gh/jamesjwu/185/head -> origin/gh/jamesjwu/185/head 2025-09-07T06:39:17.5760333Z * [new branch] gh/jamesjwu/185/orig -> origin/gh/jamesjwu/185/orig 2025-09-07T06:39:17.5760516Z * [new branch] gh/jamesjwu/186/base -> origin/gh/jamesjwu/186/base 2025-09-07T06:39:17.5760701Z * [new branch] gh/jamesjwu/186/head -> origin/gh/jamesjwu/186/head 2025-09-07T06:39:17.5760886Z * [new branch] gh/jamesjwu/186/orig -> origin/gh/jamesjwu/186/orig 2025-09-07T06:39:17.5761069Z * [new branch] gh/jamesjwu/187/base -> origin/gh/jamesjwu/187/base 2025-09-07T06:39:17.5762396Z * [new branch] gh/jamesjwu/187/head -> origin/gh/jamesjwu/187/head 2025-09-07T06:39:17.5762583Z * [new branch] gh/jamesjwu/187/orig -> origin/gh/jamesjwu/187/orig 2025-09-07T06:39:17.5762767Z * [new branch] gh/jamesjwu/188/base -> origin/gh/jamesjwu/188/base 2025-09-07T06:39:17.5762952Z * [new branch] gh/jamesjwu/188/head -> origin/gh/jamesjwu/188/head 2025-09-07T06:39:17.5763136Z * [new branch] gh/jamesjwu/188/orig -> origin/gh/jamesjwu/188/orig 2025-09-07T06:39:17.5763318Z * [new branch] gh/jamesjwu/189/base -> origin/gh/jamesjwu/189/base 2025-09-07T06:39:17.5763503Z * [new branch] gh/jamesjwu/189/head -> origin/gh/jamesjwu/189/head 2025-09-07T06:39:17.5763687Z * [new branch] gh/jamesjwu/189/orig -> origin/gh/jamesjwu/189/orig 2025-09-07T06:39:17.5763871Z * [new branch] gh/jamesjwu/190/base -> origin/gh/jamesjwu/190/base 2025-09-07T06:39:17.5764055Z * [new branch] gh/jamesjwu/190/head -> origin/gh/jamesjwu/190/head 2025-09-07T06:39:17.5764240Z * [new branch] gh/jamesjwu/190/orig -> origin/gh/jamesjwu/190/orig 2025-09-07T06:39:17.5764430Z * [new branch] gh/jamesjwu/52/base -> origin/gh/jamesjwu/52/base 2025-09-07T06:39:17.5765681Z * [new branch] gh/jamesjwu/52/head -> origin/gh/jamesjwu/52/head 2025-09-07T06:39:17.5765870Z * [new branch] gh/jamesjwu/53/base -> origin/gh/jamesjwu/53/base 2025-09-07T06:39:17.5766051Z * [new branch] gh/jamesjwu/53/head -> origin/gh/jamesjwu/53/head 2025-09-07T06:39:17.5766231Z * [new branch] gh/jamesjwu/54/base -> origin/gh/jamesjwu/54/base 2025-09-07T06:39:17.5766412Z * [new branch] gh/jamesjwu/54/head -> origin/gh/jamesjwu/54/head 2025-09-07T06:39:17.5766653Z * [new branch] gh/jamesjwu/55/base -> origin/gh/jamesjwu/55/base 2025-09-07T06:39:17.5766836Z * [new branch] gh/jamesjwu/55/head -> origin/gh/jamesjwu/55/head 2025-09-07T06:39:17.5767017Z * [new branch] gh/jamesjwu/56/base -> origin/gh/jamesjwu/56/base 2025-09-07T06:39:17.5767199Z * [new branch] gh/jamesjwu/56/head -> origin/gh/jamesjwu/56/head 2025-09-07T06:39:17.5767382Z * [new branch] gh/jamesjwu/57/base -> origin/gh/jamesjwu/57/base 2025-09-07T06:39:17.5767562Z * [new branch] gh/jamesjwu/57/head -> origin/gh/jamesjwu/57/head 2025-09-07T06:39:17.5767743Z * [new branch] gh/jamesjwu/58/base -> origin/gh/jamesjwu/58/base 2025-09-07T06:39:17.5769008Z * [new branch] gh/jamesjwu/58/head -> origin/gh/jamesjwu/58/head 2025-09-07T06:39:17.5769192Z * [new branch] gh/jamesjwu/59/base -> origin/gh/jamesjwu/59/base 2025-09-07T06:39:17.5769374Z * [new branch] gh/jamesjwu/59/head -> origin/gh/jamesjwu/59/head 2025-09-07T06:39:17.5769555Z * [new branch] gh/jamesjwu/60/base -> origin/gh/jamesjwu/60/base 2025-09-07T06:39:17.5769734Z * [new branch] gh/jamesjwu/60/head -> origin/gh/jamesjwu/60/head 2025-09-07T06:39:17.5769996Z * [new branch] gh/jamesjwu/61/base -> origin/gh/jamesjwu/61/base 2025-09-07T06:39:17.5770176Z * [new branch] gh/jamesjwu/61/head -> origin/gh/jamesjwu/61/head 2025-09-07T06:39:17.5770355Z * [new branch] gh/jamesjwu/62/base -> origin/gh/jamesjwu/62/base 2025-09-07T06:39:17.5770535Z * [new branch] gh/jamesjwu/62/head -> origin/gh/jamesjwu/62/head 2025-09-07T06:39:17.5770715Z * [new branch] gh/jamesjwu/63/base -> origin/gh/jamesjwu/63/base 2025-09-07T06:39:17.5770894Z * [new branch] gh/jamesjwu/63/head -> origin/gh/jamesjwu/63/head 2025-09-07T06:39:17.5771074Z * [new branch] gh/jamesjwu/64/base -> origin/gh/jamesjwu/64/base 2025-09-07T06:39:17.5771305Z * [new branch] gh/jamesjwu/64/head -> origin/gh/jamesjwu/64/head 2025-09-07T06:39:17.5772599Z * [new branch] gh/jamesjwu/65/base -> origin/gh/jamesjwu/65/base 2025-09-07T06:39:17.5772783Z * [new branch] gh/jamesjwu/65/head -> origin/gh/jamesjwu/65/head 2025-09-07T06:39:17.5772965Z * [new branch] gh/janeyx99/165/base -> origin/gh/janeyx99/165/base 2025-09-07T06:39:17.5773148Z * [new branch] gh/janeyx99/165/head -> origin/gh/janeyx99/165/head 2025-09-07T06:39:17.5773331Z * [new branch] gh/janeyx99/165/orig -> origin/gh/janeyx99/165/orig 2025-09-07T06:39:17.5773512Z * [new branch] gh/janeyx99/201/base -> origin/gh/janeyx99/201/base 2025-09-07T06:39:17.5773694Z * [new branch] gh/janeyx99/201/head -> origin/gh/janeyx99/201/head 2025-09-07T06:39:17.5773878Z * [new branch] gh/janeyx99/201/orig -> origin/gh/janeyx99/201/orig 2025-09-07T06:39:17.5774059Z * [new branch] gh/janeyx99/225/base -> origin/gh/janeyx99/225/base 2025-09-07T06:39:17.5774240Z * [new branch] gh/janeyx99/225/head -> origin/gh/janeyx99/225/head 2025-09-07T06:39:17.5774423Z * [new branch] gh/janeyx99/225/orig -> origin/gh/janeyx99/225/orig 2025-09-07T06:39:17.5774605Z * [new branch] gh/janeyx99/296/base -> origin/gh/janeyx99/296/base 2025-09-07T06:39:17.5775851Z * [new branch] gh/janeyx99/296/head -> origin/gh/janeyx99/296/head 2025-09-07T06:39:17.5776033Z * [new branch] gh/janeyx99/296/orig -> origin/gh/janeyx99/296/orig 2025-09-07T06:39:17.5776215Z * [new branch] gh/janeyx99/297/base -> origin/gh/janeyx99/297/base 2025-09-07T06:39:17.5776396Z * [new branch] gh/janeyx99/297/head -> origin/gh/janeyx99/297/head 2025-09-07T06:39:17.5776647Z * [new branch] gh/janeyx99/297/orig -> origin/gh/janeyx99/297/orig 2025-09-07T06:39:17.5776829Z * [new branch] gh/janeyx99/298/base -> origin/gh/janeyx99/298/base 2025-09-07T06:39:17.5777011Z * [new branch] gh/janeyx99/298/head -> origin/gh/janeyx99/298/head 2025-09-07T06:39:17.5777194Z * [new branch] gh/janeyx99/298/orig -> origin/gh/janeyx99/298/orig 2025-09-07T06:39:17.5777374Z * [new branch] gh/janeyx99/299/base -> origin/gh/janeyx99/299/base 2025-09-07T06:39:17.5777557Z * [new branch] gh/janeyx99/299/head -> origin/gh/janeyx99/299/head 2025-09-07T06:39:17.5777739Z * [new branch] gh/janeyx99/299/orig -> origin/gh/janeyx99/299/orig 2025-09-07T06:39:17.5777921Z * [new branch] gh/janeyx99/300/base -> origin/gh/janeyx99/300/base 2025-09-07T06:39:17.5779186Z * [new branch] gh/janeyx99/300/head -> origin/gh/janeyx99/300/head 2025-09-07T06:39:17.5779372Z * [new branch] gh/janeyx99/300/orig -> origin/gh/janeyx99/300/orig 2025-09-07T06:39:17.5779554Z * [new branch] gh/janeyx99/301/base -> origin/gh/janeyx99/301/base 2025-09-07T06:39:17.5779801Z * [new branch] gh/janeyx99/301/head -> origin/gh/janeyx99/301/head 2025-09-07T06:39:17.5779982Z * [new branch] gh/janeyx99/301/orig -> origin/gh/janeyx99/301/orig 2025-09-07T06:39:17.5780164Z * [new branch] gh/janeyx99/302/base -> origin/gh/janeyx99/302/base 2025-09-07T06:39:17.5780344Z * [new branch] gh/janeyx99/302/head -> origin/gh/janeyx99/302/head 2025-09-07T06:39:17.5780525Z * [new branch] gh/janeyx99/303/base -> origin/gh/janeyx99/303/base 2025-09-07T06:39:17.5780706Z * [new branch] gh/janeyx99/303/head -> origin/gh/janeyx99/303/head 2025-09-07T06:39:17.5780887Z * [new branch] gh/janeyx99/88/base -> origin/gh/janeyx99/88/base 2025-09-07T06:39:17.5781117Z * [new branch] gh/janeyx99/88/head -> origin/gh/janeyx99/88/head 2025-09-07T06:39:17.5781296Z * [new branch] gh/janeyx99/88/orig -> origin/gh/janeyx99/88/orig 2025-09-07T06:39:17.5781478Z * [new branch] gh/jansel/360/base -> origin/gh/jansel/360/base 2025-09-07T06:39:17.5782782Z * [new branch] gh/jansel/360/head -> origin/gh/jansel/360/head 2025-09-07T06:39:17.5782960Z * [new branch] gh/jansel/451/base -> origin/gh/jansel/451/base 2025-09-07T06:39:17.5783137Z * [new branch] gh/jansel/451/head -> origin/gh/jansel/451/head 2025-09-07T06:39:17.5783312Z * [new branch] gh/jansel/451/orig -> origin/gh/jansel/451/orig 2025-09-07T06:39:17.5783489Z * [new branch] gh/jansel/462/base -> origin/gh/jansel/462/base 2025-09-07T06:39:17.5783667Z * [new branch] gh/jansel/462/head -> origin/gh/jansel/462/head 2025-09-07T06:39:17.5783842Z * [new branch] gh/jansel/462/orig -> origin/gh/jansel/462/orig 2025-09-07T06:39:17.5784018Z * [new branch] gh/jansel/531/base -> origin/gh/jansel/531/base 2025-09-07T06:39:17.5784195Z * [new branch] gh/jansel/531/head -> origin/gh/jansel/531/head 2025-09-07T06:39:17.5784371Z * [new branch] gh/jansel/531/orig -> origin/gh/jansel/531/orig 2025-09-07T06:39:17.5784559Z * [new branch] gh/jbschlosser/208/head -> origin/gh/jbschlosser/208/head 2025-09-07T06:39:17.5784759Z * [new branch] gh/jbschlosser/247/base -> origin/gh/jbschlosser/247/base 2025-09-07T06:39:17.5784955Z * [new branch] gh/jbschlosser/247/head -> origin/gh/jbschlosser/247/head 2025-09-07T06:39:17.5786180Z * [new branch] gh/jbschlosser/247/orig -> origin/gh/jbschlosser/247/orig 2025-09-07T06:39:17.5786378Z * [new branch] gh/jbschlosser/248/base -> origin/gh/jbschlosser/248/base 2025-09-07T06:39:17.5786629Z * [new branch] gh/jbschlosser/248/head -> origin/gh/jbschlosser/248/head 2025-09-07T06:39:17.5786824Z * [new branch] gh/jbschlosser/248/orig -> origin/gh/jbschlosser/248/orig 2025-09-07T06:39:17.5787015Z * [new branch] gh/jbschlosser/250/base -> origin/gh/jbschlosser/250/base 2025-09-07T06:39:17.5787207Z * [new branch] gh/jbschlosser/250/head -> origin/gh/jbschlosser/250/head 2025-09-07T06:39:17.5787397Z * [new branch] gh/jbschlosser/250/orig -> origin/gh/jbschlosser/250/orig 2025-09-07T06:39:17.5787587Z * [new branch] gh/jiayisunx/59/base -> origin/gh/jiayisunx/59/base 2025-09-07T06:39:17.5787774Z * [new branch] gh/jiayisunx/59/head -> origin/gh/jiayisunx/59/head 2025-09-07T06:39:17.5787960Z * [new branch] gh/jiayisunx/59/orig -> origin/gh/jiayisunx/59/orig 2025-09-07T06:39:17.5788146Z * [new branch] gh/jiayisunx/61/base -> origin/gh/jiayisunx/61/base 2025-09-07T06:39:17.5788330Z * [new branch] gh/jiayisunx/61/head -> origin/gh/jiayisunx/61/head 2025-09-07T06:39:17.5788576Z * [new branch] gh/jiayisunx/61/orig -> origin/gh/jiayisunx/61/orig 2025-09-07T06:39:17.5789834Z * [new branch] gh/jiayisunx/64/base -> origin/gh/jiayisunx/64/base 2025-09-07T06:39:17.5790018Z * [new branch] gh/jiayisunx/64/head -> origin/gh/jiayisunx/64/head 2025-09-07T06:39:17.5790202Z * [new branch] gh/jiayisunx/64/orig -> origin/gh/jiayisunx/64/orig 2025-09-07T06:39:17.5790386Z * [new branch] gh/jiayisunx/65/base -> origin/gh/jiayisunx/65/base 2025-09-07T06:39:17.5790568Z * [new branch] gh/jiayisunx/65/head -> origin/gh/jiayisunx/65/head 2025-09-07T06:39:17.5790752Z * [new branch] gh/jiayisunx/65/orig -> origin/gh/jiayisunx/65/orig 2025-09-07T06:39:17.5790993Z * [new branch] gh/jiayisunx/66/base -> origin/gh/jiayisunx/66/base 2025-09-07T06:39:17.5791178Z * [new branch] gh/jiayisunx/66/head -> origin/gh/jiayisunx/66/head 2025-09-07T06:39:17.5791362Z * [new branch] gh/jiayisunx/66/orig -> origin/gh/jiayisunx/66/orig 2025-09-07T06:39:17.5791547Z * [new branch] gh/jiayisunx/67/base -> origin/gh/jiayisunx/67/base 2025-09-07T06:39:17.5791732Z * [new branch] gh/jiayisunx/67/head -> origin/gh/jiayisunx/67/head 2025-09-07T06:39:17.5791916Z * [new branch] gh/jiayisunx/67/orig -> origin/gh/jiayisunx/67/orig 2025-09-07T06:39:17.5793162Z * [new branch] gh/jiayisunx/68/base -> origin/gh/jiayisunx/68/base 2025-09-07T06:39:17.5793348Z * [new branch] gh/jiayisunx/68/head -> origin/gh/jiayisunx/68/head 2025-09-07T06:39:17.5793531Z * [new branch] gh/jiayisunx/68/orig -> origin/gh/jiayisunx/68/orig 2025-09-07T06:39:17.5793714Z * [new branch] gh/jiayisunx/69/base -> origin/gh/jiayisunx/69/base 2025-09-07T06:39:17.5793897Z * [new branch] gh/jiayisunx/69/head -> origin/gh/jiayisunx/69/head 2025-09-07T06:39:17.5794081Z * [new branch] gh/jiayisunx/69/orig -> origin/gh/jiayisunx/69/orig 2025-09-07T06:39:17.5794263Z * [new branch] gh/jiayisunx/70/base -> origin/gh/jiayisunx/70/base 2025-09-07T06:39:17.5794447Z * [new branch] gh/jiayisunx/70/head -> origin/gh/jiayisunx/70/head 2025-09-07T06:39:17.5794630Z * [new branch] gh/jiayisunx/70/orig -> origin/gh/jiayisunx/70/orig 2025-09-07T06:39:17.5794812Z * [new branch] gh/jiayisunx/71/base -> origin/gh/jiayisunx/71/base 2025-09-07T06:39:17.5794995Z * [new branch] gh/jiayisunx/71/head -> origin/gh/jiayisunx/71/head 2025-09-07T06:39:17.5795180Z * [new branch] gh/jiayisunx/71/orig -> origin/gh/jiayisunx/71/orig 2025-09-07T06:39:17.5796394Z * [new branch] gh/jiayisunx/72/base -> origin/gh/jiayisunx/72/base 2025-09-07T06:39:17.5796660Z * [new branch] gh/jiayisunx/72/head -> origin/gh/jiayisunx/72/head 2025-09-07T06:39:17.5796843Z * [new branch] gh/jiayisunx/72/orig -> origin/gh/jiayisunx/72/orig 2025-09-07T06:39:17.5797027Z * [new branch] gh/jiayisunx/73/base -> origin/gh/jiayisunx/73/base 2025-09-07T06:39:17.5797212Z * [new branch] gh/jiayisunx/73/head -> origin/gh/jiayisunx/73/head 2025-09-07T06:39:17.5797395Z * [new branch] gh/jiayisunx/73/orig -> origin/gh/jiayisunx/73/orig 2025-09-07T06:39:17.5797578Z * [new branch] gh/jiayisunx/74/base -> origin/gh/jiayisunx/74/base 2025-09-07T06:39:17.5797762Z * [new branch] gh/jiayisunx/74/head -> origin/gh/jiayisunx/74/head 2025-09-07T06:39:17.5797949Z * [new branch] gh/jiayisunx/74/orig -> origin/gh/jiayisunx/74/orig 2025-09-07T06:39:17.5798198Z * [new branch] gh/jiayisunx/75/base -> origin/gh/jiayisunx/75/base 2025-09-07T06:39:17.5798438Z * [new branch] gh/jiayisunx/75/head -> origin/gh/jiayisunx/75/head 2025-09-07T06:39:17.5798622Z * [new branch] gh/jiayisunx/75/orig -> origin/gh/jiayisunx/75/orig 2025-09-07T06:39:17.5798806Z * [new branch] gh/jiayisunx/76/base -> origin/gh/jiayisunx/76/base 2025-09-07T06:39:17.5800068Z * [new branch] gh/jiayisunx/76/head -> origin/gh/jiayisunx/76/head 2025-09-07T06:39:17.5800252Z * [new branch] gh/jiayisunx/76/orig -> origin/gh/jiayisunx/76/orig 2025-09-07T06:39:17.5800444Z * [new branch] gh/jjwu@meta.com/1/base -> origin/gh/jjwu@meta.com/1/base 2025-09-07T06:39:17.5800634Z * [new branch] gh/jjwu@meta.com/1/head -> origin/gh/jjwu@meta.com/1/head 2025-09-07T06:39:17.5800901Z * [new branch] gh/justinchuby/111/base -> origin/gh/justinchuby/111/base 2025-09-07T06:39:17.5801096Z * [new branch] gh/justinchuby/111/head -> origin/gh/justinchuby/111/head 2025-09-07T06:39:17.5801289Z * [new branch] gh/justinchuby/111/orig -> origin/gh/justinchuby/111/orig 2025-09-07T06:39:17.5801482Z * [new branch] gh/justinchuby/112/base -> origin/gh/justinchuby/112/base 2025-09-07T06:39:17.5801676Z * [new branch] gh/justinchuby/112/head -> origin/gh/justinchuby/112/head 2025-09-07T06:39:17.5801869Z * [new branch] gh/justinchuby/112/orig -> origin/gh/justinchuby/112/orig 2025-09-07T06:39:17.5802062Z * [new branch] gh/justinchuby/113/base -> origin/gh/justinchuby/113/base 2025-09-07T06:39:17.5802255Z * [new branch] gh/justinchuby/113/head -> origin/gh/justinchuby/113/head 2025-09-07T06:39:17.5803763Z * [new branch] gh/justinchuby/113/orig -> origin/gh/justinchuby/113/orig 2025-09-07T06:39:17.5803958Z * [new branch] gh/justinchuby/114/base -> origin/gh/justinchuby/114/base 2025-09-07T06:39:17.5804152Z * [new branch] gh/justinchuby/114/head -> origin/gh/justinchuby/114/head 2025-09-07T06:39:17.5804346Z * [new branch] gh/justinchuby/114/orig -> origin/gh/justinchuby/114/orig 2025-09-07T06:39:17.5804538Z * [new branch] gh/justinchuby/115/base -> origin/gh/justinchuby/115/base 2025-09-07T06:39:17.5804730Z * [new branch] gh/justinchuby/115/head -> origin/gh/justinchuby/115/head 2025-09-07T06:39:17.5804923Z * [new branch] gh/justinchuby/115/orig -> origin/gh/justinchuby/115/orig 2025-09-07T06:39:17.5805113Z * [new branch] gh/karthickai/1/base -> origin/gh/karthickai/1/base 2025-09-07T06:39:17.5805301Z * [new branch] gh/karthickai/1/head -> origin/gh/karthickai/1/head 2025-09-07T06:39:17.5805489Z * [new branch] gh/karthickai/1/orig -> origin/gh/karthickai/1/orig 2025-09-07T06:39:17.5805675Z * [new branch] gh/karthickai/2/base -> origin/gh/karthickai/2/base 2025-09-07T06:39:17.5805861Z * [new branch] gh/karthickai/2/head -> origin/gh/karthickai/2/head 2025-09-07T06:39:17.5807186Z * [new branch] gh/karthickai/2/orig -> origin/gh/karthickai/2/orig 2025-09-07T06:39:17.5807383Z * [new branch] gh/kurtamohler/32/base -> origin/gh/kurtamohler/32/base 2025-09-07T06:39:17.5807578Z * [new branch] gh/kurtamohler/32/head -> origin/gh/kurtamohler/32/head 2025-09-07T06:39:17.5807770Z * [new branch] gh/kurtamohler/32/orig -> origin/gh/kurtamohler/32/orig 2025-09-07T06:39:17.5807961Z * [new branch] gh/kurtamohler/33/base -> origin/gh/kurtamohler/33/base 2025-09-07T06:39:17.5808152Z * [new branch] gh/kurtamohler/33/head -> origin/gh/kurtamohler/33/head 2025-09-07T06:39:17.5808343Z * [new branch] gh/kurtamohler/33/orig -> origin/gh/kurtamohler/33/orig 2025-09-07T06:39:17.5808534Z * [new branch] gh/kurtamohler/34/base -> origin/gh/kurtamohler/34/base 2025-09-07T06:39:17.5808789Z * [new branch] gh/kurtamohler/34/head -> origin/gh/kurtamohler/34/head 2025-09-07T06:39:17.5808980Z * [new branch] gh/kurtamohler/34/orig -> origin/gh/kurtamohler/34/orig 2025-09-07T06:39:17.5809170Z * [new branch] gh/kurtamohler/41/base -> origin/gh/kurtamohler/41/base 2025-09-07T06:39:17.5809360Z * [new branch] gh/kurtamohler/41/head -> origin/gh/kurtamohler/41/head 2025-09-07T06:39:17.5810622Z * [new branch] gh/kurtamohler/41/orig -> origin/gh/kurtamohler/41/orig 2025-09-07T06:39:17.5810814Z * [new branch] gh/kurtamohler/46/base -> origin/gh/kurtamohler/46/base 2025-09-07T06:39:17.5811066Z * [new branch] gh/kurtamohler/46/head -> origin/gh/kurtamohler/46/head 2025-09-07T06:39:17.5811253Z * [new branch] gh/kurtamohler/46/orig -> origin/gh/kurtamohler/46/orig 2025-09-07T06:39:17.5811445Z * [new branch] gh/kurtamohler/47/base -> origin/gh/kurtamohler/47/base 2025-09-07T06:39:17.5811635Z * [new branch] gh/kurtamohler/47/head -> origin/gh/kurtamohler/47/head 2025-09-07T06:39:17.5811823Z * [new branch] gh/kurtamohler/47/orig -> origin/gh/kurtamohler/47/orig 2025-09-07T06:39:17.5812012Z * [new branch] gh/kurtamohler/48/base -> origin/gh/kurtamohler/48/base 2025-09-07T06:39:17.5812200Z * [new branch] gh/kurtamohler/48/head -> origin/gh/kurtamohler/48/head 2025-09-07T06:39:17.5812390Z * [new branch] gh/kurtamohler/48/orig -> origin/gh/kurtamohler/48/orig 2025-09-07T06:39:17.5812580Z * [new branch] gh/kurtamohler/49/base -> origin/gh/kurtamohler/49/base 2025-09-07T06:39:17.5812770Z * [new branch] gh/kurtamohler/49/head -> origin/gh/kurtamohler/49/head 2025-09-07T06:39:17.5812959Z * [new branch] gh/kurtamohler/49/orig -> origin/gh/kurtamohler/49/orig 2025-09-07T06:39:17.5814223Z * [new branch] gh/kurtamohler/50/base -> origin/gh/kurtamohler/50/base 2025-09-07T06:39:17.5814415Z * [new branch] gh/kurtamohler/50/head -> origin/gh/kurtamohler/50/head 2025-09-07T06:39:17.5814605Z * [new branch] gh/kurtamohler/50/orig -> origin/gh/kurtamohler/50/orig 2025-09-07T06:39:17.5814791Z * [new branch] gh/kwen2501/130/base -> origin/gh/kwen2501/130/base 2025-09-07T06:39:17.5814974Z * [new branch] gh/kwen2501/130/head -> origin/gh/kwen2501/130/head 2025-09-07T06:39:17.5815155Z * [new branch] gh/kwen2501/130/orig -> origin/gh/kwen2501/130/orig 2025-09-07T06:39:17.5815340Z * [new branch] gh/kwen2501/15/base -> origin/gh/kwen2501/15/base 2025-09-07T06:39:17.5815522Z * [new branch] gh/kwen2501/15/head -> origin/gh/kwen2501/15/head 2025-09-07T06:39:17.5815700Z * [new branch] gh/kwen2501/156/base -> origin/gh/kwen2501/156/base 2025-09-07T06:39:17.5815883Z * [new branch] gh/kwen2501/156/head -> origin/gh/kwen2501/156/head 2025-09-07T06:39:17.5816064Z * [new branch] gh/kwen2501/156/orig -> origin/gh/kwen2501/156/orig 2025-09-07T06:39:17.5816245Z * [new branch] gh/kwen2501/170/base -> origin/gh/kwen2501/170/base 2025-09-07T06:39:17.5817573Z * [new branch] gh/kwen2501/170/head -> origin/gh/kwen2501/170/head 2025-09-07T06:39:17.5817757Z * [new branch] gh/kwen2501/186/base -> origin/gh/kwen2501/186/base 2025-09-07T06:39:17.5817937Z * [new branch] gh/kwen2501/186/head -> origin/gh/kwen2501/186/head 2025-09-07T06:39:17.5818117Z * [new branch] gh/kwen2501/186/orig -> origin/gh/kwen2501/186/orig 2025-09-07T06:39:17.5818297Z * [new branch] gh/kwen2501/187/base -> origin/gh/kwen2501/187/base 2025-09-07T06:39:17.5819713Z * [new branch] gh/kwen2501/187/head -> origin/gh/kwen2501/187/head 2025-09-07T06:39:17.5819896Z * [new branch] gh/kwen2501/187/orig -> origin/gh/kwen2501/187/orig 2025-09-07T06:39:17.5820076Z * [new branch] gh/kwen2501/188/base -> origin/gh/kwen2501/188/base 2025-09-07T06:39:17.5820256Z * [new branch] gh/kwen2501/188/head -> origin/gh/kwen2501/188/head 2025-09-07T06:39:17.5820436Z * [new branch] gh/kwen2501/188/orig -> origin/gh/kwen2501/188/orig 2025-09-07T06:39:17.5820617Z * [new branch] gh/kwen2501/194/base -> origin/gh/kwen2501/194/base 2025-09-07T06:39:17.5820797Z * [new branch] gh/kwen2501/194/head -> origin/gh/kwen2501/194/head 2025-09-07T06:39:17.5820942Z * [new branch] gh/kwen2501/194/orig -> origin/gh/kwen2501/194/orig 2025-09-07T06:39:17.5822168Z * [new branch] gh/kwen2501/199/base -> origin/gh/kwen2501/199/base 2025-09-07T06:39:17.5822246Z * [new branch] gh/kwen2501/199/head -> origin/gh/kwen2501/199/head 2025-09-07T06:39:17.5822319Z * [new branch] gh/kwen2501/199/orig -> origin/gh/kwen2501/199/orig 2025-09-07T06:39:17.5822391Z * [new branch] gh/kwen2501/200/base -> origin/gh/kwen2501/200/base 2025-09-07T06:39:17.5822462Z * [new branch] gh/kwen2501/200/head -> origin/gh/kwen2501/200/head 2025-09-07T06:39:17.5822533Z * [new branch] gh/kwen2501/200/orig -> origin/gh/kwen2501/200/orig 2025-09-07T06:39:17.5822606Z * [new branch] gh/kwen2501/201/base -> origin/gh/kwen2501/201/base 2025-09-07T06:39:17.5822679Z * [new branch] gh/kwen2501/201/head -> origin/gh/kwen2501/201/head 2025-09-07T06:39:17.5822751Z * [new branch] gh/kwen2501/201/orig -> origin/gh/kwen2501/201/orig 2025-09-07T06:39:17.5822823Z * [new branch] gh/kwen2501/203/base -> origin/gh/kwen2501/203/base 2025-09-07T06:39:17.5822896Z * [new branch] gh/kwen2501/203/head -> origin/gh/kwen2501/203/head 2025-09-07T06:39:17.5822968Z * [new branch] gh/kwen2501/203/orig -> origin/gh/kwen2501/203/orig 2025-09-07T06:39:17.5823041Z * [new branch] gh/kwen2501/204/base -> origin/gh/kwen2501/204/base 2025-09-07T06:39:17.5823112Z * [new branch] gh/kwen2501/204/head -> origin/gh/kwen2501/204/head 2025-09-07T06:39:17.5823183Z * [new branch] gh/kwen2501/204/orig -> origin/gh/kwen2501/204/orig 2025-09-07T06:39:17.5823256Z * [new branch] gh/kwen2501/205/base -> origin/gh/kwen2501/205/base 2025-09-07T06:39:17.5823329Z * [new branch] gh/kwen2501/205/head -> origin/gh/kwen2501/205/head 2025-09-07T06:39:17.5823400Z * [new branch] gh/kwen2501/205/orig -> origin/gh/kwen2501/205/orig 2025-09-07T06:39:17.5823472Z * [new branch] gh/kwen2501/206/base -> origin/gh/kwen2501/206/base 2025-09-07T06:39:17.5823545Z * [new branch] gh/kwen2501/206/head -> origin/gh/kwen2501/206/head 2025-09-07T06:39:17.5823617Z * [new branch] gh/kwen2501/206/orig -> origin/gh/kwen2501/206/orig 2025-09-07T06:39:17.5823689Z * [new branch] gh/kwen2501/207/base -> origin/gh/kwen2501/207/base 2025-09-07T06:39:17.5823761Z * [new branch] gh/kwen2501/207/head -> origin/gh/kwen2501/207/head 2025-09-07T06:39:17.5823832Z * [new branch] gh/kwen2501/207/orig -> origin/gh/kwen2501/207/orig 2025-09-07T06:39:17.5824960Z * [new branch] gh/kwen2501/208/base -> origin/gh/kwen2501/208/base 2025-09-07T06:39:17.5825037Z * [new branch] gh/kwen2501/208/head -> origin/gh/kwen2501/208/head 2025-09-07T06:39:17.5825108Z * [new branch] gh/kwen2501/208/orig -> origin/gh/kwen2501/208/orig 2025-09-07T06:39:17.5825218Z * [new branch] gh/kwen2501/209/base -> origin/gh/kwen2501/209/base 2025-09-07T06:39:17.5825290Z * [new branch] gh/kwen2501/209/head -> origin/gh/kwen2501/209/head 2025-09-07T06:39:17.5825362Z * [new branch] gh/kwen2501/209/orig -> origin/gh/kwen2501/209/orig 2025-09-07T06:39:17.5825434Z * [new branch] gh/kwen2501/210/base -> origin/gh/kwen2501/210/base 2025-09-07T06:39:17.5825506Z * [new branch] gh/kwen2501/210/head -> origin/gh/kwen2501/210/head 2025-09-07T06:39:17.5825577Z * [new branch] gh/kwen2501/210/orig -> origin/gh/kwen2501/210/orig 2025-09-07T06:39:17.5825649Z * [new branch] gh/kwen2501/211/base -> origin/gh/kwen2501/211/base 2025-09-07T06:39:17.5825750Z * [new branch] gh/kwen2501/211/head -> origin/gh/kwen2501/211/head 2025-09-07T06:39:17.5825821Z * [new branch] gh/kwen2501/212/base -> origin/gh/kwen2501/212/base 2025-09-07T06:39:17.5825894Z * [new branch] gh/kwen2501/212/head -> origin/gh/kwen2501/212/head 2025-09-07T06:39:17.5825966Z * [new branch] gh/kwen2501/212/orig -> origin/gh/kwen2501/212/orig 2025-09-07T06:39:17.5826037Z * [new branch] gh/kwen2501/213/base -> origin/gh/kwen2501/213/base 2025-09-07T06:39:17.5826108Z * [new branch] gh/kwen2501/213/head -> origin/gh/kwen2501/213/head 2025-09-07T06:39:17.5826181Z * [new branch] gh/kwen2501/213/orig -> origin/gh/kwen2501/213/orig 2025-09-07T06:39:17.5826252Z * [new branch] gh/kwen2501/214/base -> origin/gh/kwen2501/214/base 2025-09-07T06:39:17.5826324Z * [new branch] gh/kwen2501/214/head -> origin/gh/kwen2501/214/head 2025-09-07T06:39:17.5826397Z * [new branch] gh/kwen2501/214/orig -> origin/gh/kwen2501/214/orig 2025-09-07T06:39:17.5826469Z * [new branch] gh/kwen2501/215/base -> origin/gh/kwen2501/215/base 2025-09-07T06:39:17.5826611Z * [new branch] gh/kwen2501/215/head -> origin/gh/kwen2501/215/head 2025-09-07T06:39:17.5826683Z * [new branch] gh/kwen2501/215/orig -> origin/gh/kwen2501/215/orig 2025-09-07T06:39:17.5826754Z * [new branch] gh/kwen2501/216/base -> origin/gh/kwen2501/216/base 2025-09-07T06:39:17.5826826Z * [new branch] gh/kwen2501/216/head -> origin/gh/kwen2501/216/head 2025-09-07T06:39:17.5827977Z * [new branch] gh/kwen2501/216/orig -> origin/gh/kwen2501/216/orig 2025-09-07T06:39:17.5828050Z * [new branch] gh/kwen2501/217/base -> origin/gh/kwen2501/217/base 2025-09-07T06:39:17.5828123Z * [new branch] gh/kwen2501/217/head -> origin/gh/kwen2501/217/head 2025-09-07T06:39:17.5828195Z * [new branch] gh/kwen2501/217/orig -> origin/gh/kwen2501/217/orig 2025-09-07T06:39:17.5828268Z * [new branch] gh/kwen2501/218/base -> origin/gh/kwen2501/218/base 2025-09-07T06:39:17.5828340Z * [new branch] gh/kwen2501/218/head -> origin/gh/kwen2501/218/head 2025-09-07T06:39:17.5828412Z * [new branch] gh/kwen2501/218/orig -> origin/gh/kwen2501/218/orig 2025-09-07T06:39:17.5828483Z * [new branch] gh/kwen2501/219/base -> origin/gh/kwen2501/219/base 2025-09-07T06:39:17.5828555Z * [new branch] gh/kwen2501/219/head -> origin/gh/kwen2501/219/head 2025-09-07T06:39:17.5828627Z * [new branch] gh/kwen2501/219/orig -> origin/gh/kwen2501/219/orig 2025-09-07T06:39:17.5828698Z * [new branch] gh/kwen2501/220/base -> origin/gh/kwen2501/220/base 2025-09-07T06:39:17.5828771Z * [new branch] gh/kwen2501/220/head -> origin/gh/kwen2501/220/head 2025-09-07T06:39:17.5828843Z * [new branch] gh/kwen2501/220/orig -> origin/gh/kwen2501/220/orig 2025-09-07T06:39:17.5829000Z * [new branch] gh/kwen2501/221/base -> origin/gh/kwen2501/221/base 2025-09-07T06:39:17.5829073Z * [new branch] gh/kwen2501/221/head -> origin/gh/kwen2501/221/head 2025-09-07T06:39:17.5829145Z * [new branch] gh/kwen2501/221/orig -> origin/gh/kwen2501/221/orig 2025-09-07T06:39:17.5829218Z * [new branch] gh/kwen2501/222/base -> origin/gh/kwen2501/222/base 2025-09-07T06:39:17.5829289Z * [new branch] gh/kwen2501/222/head -> origin/gh/kwen2501/222/head 2025-09-07T06:39:17.5829360Z * [new branch] gh/kwen2501/222/orig -> origin/gh/kwen2501/222/orig 2025-09-07T06:39:17.5829433Z * [new branch] gh/kwen2501/223/base -> origin/gh/kwen2501/223/base 2025-09-07T06:39:17.5829572Z * [new branch] gh/kwen2501/223/head -> origin/gh/kwen2501/223/head 2025-09-07T06:39:17.5829643Z * [new branch] gh/kwen2501/223/orig -> origin/gh/kwen2501/223/orig 2025-09-07T06:39:17.5829717Z * [new branch] gh/kwen2501/224/base -> origin/gh/kwen2501/224/base 2025-09-07T06:39:17.5829788Z * [new branch] gh/kwen2501/224/head -> origin/gh/kwen2501/224/head 2025-09-07T06:39:17.5830927Z * [new branch] gh/kwen2501/224/orig -> origin/gh/kwen2501/224/orig 2025-09-07T06:39:17.5831003Z * [new branch] gh/kwen2501/225/base -> origin/gh/kwen2501/225/base 2025-09-07T06:39:17.5831075Z * [new branch] gh/kwen2501/225/head -> origin/gh/kwen2501/225/head 2025-09-07T06:39:17.5831147Z * [new branch] gh/kwen2501/225/orig -> origin/gh/kwen2501/225/orig 2025-09-07T06:39:17.5831220Z * [new branch] gh/kwen2501/226/base -> origin/gh/kwen2501/226/base 2025-09-07T06:39:17.5831291Z * [new branch] gh/kwen2501/226/head -> origin/gh/kwen2501/226/head 2025-09-07T06:39:17.5831362Z * [new branch] gh/kwen2501/226/orig -> origin/gh/kwen2501/226/orig 2025-09-07T06:39:17.5831436Z * [new branch] gh/kwen2501/227/base -> origin/gh/kwen2501/227/base 2025-09-07T06:39:17.5831507Z * [new branch] gh/kwen2501/227/head -> origin/gh/kwen2501/227/head 2025-09-07T06:39:17.5831579Z * [new branch] gh/kwen2501/227/orig -> origin/gh/kwen2501/227/orig 2025-09-07T06:39:17.5831651Z * [new branch] gh/kwen2501/228/base -> origin/gh/kwen2501/228/base 2025-09-07T06:39:17.5831722Z * [new branch] gh/kwen2501/228/head -> origin/gh/kwen2501/228/head 2025-09-07T06:39:17.5831793Z * [new branch] gh/kwen2501/228/orig -> origin/gh/kwen2501/228/orig 2025-09-07T06:39:17.5831867Z * [new branch] gh/kwen2501/229/base -> origin/gh/kwen2501/229/base 2025-09-07T06:39:17.5831939Z * [new branch] gh/kwen2501/229/head -> origin/gh/kwen2501/229/head 2025-09-07T06:39:17.5832010Z * [new branch] gh/kwen2501/229/orig -> origin/gh/kwen2501/229/orig 2025-09-07T06:39:17.5832084Z * [new branch] gh/kwen2501/230/base -> origin/gh/kwen2501/230/base 2025-09-07T06:39:17.5832156Z * [new branch] gh/kwen2501/230/head -> origin/gh/kwen2501/230/head 2025-09-07T06:39:17.5832228Z * [new branch] gh/kwen2501/230/orig -> origin/gh/kwen2501/230/orig 2025-09-07T06:39:17.5832300Z * [new branch] gh/kwen2501/231/base -> origin/gh/kwen2501/231/base 2025-09-07T06:39:17.5832373Z * [new branch] gh/kwen2501/231/head -> origin/gh/kwen2501/231/head 2025-09-07T06:39:17.5832444Z * [new branch] gh/kwen2501/231/orig -> origin/gh/kwen2501/231/orig 2025-09-07T06:39:17.5832518Z * [new branch] gh/kwen2501/232/base -> origin/gh/kwen2501/232/base 2025-09-07T06:39:17.5832590Z * [new branch] gh/kwen2501/232/head -> origin/gh/kwen2501/232/head 2025-09-07T06:39:17.5833753Z * [new branch] gh/kwen2501/232/orig -> origin/gh/kwen2501/232/orig 2025-09-07T06:39:17.5833841Z * [new branch] gh/laithsakka/156/base -> origin/gh/laithsakka/156/base 2025-09-07T06:39:17.5833920Z * [new branch] gh/laithsakka/156/head -> origin/gh/laithsakka/156/head 2025-09-07T06:39:17.5833998Z * [new branch] gh/laithsakka/156/orig -> origin/gh/laithsakka/156/orig 2025-09-07T06:39:17.5834076Z * [new branch] gh/laithsakka/160/base -> origin/gh/laithsakka/160/base 2025-09-07T06:39:17.5834153Z * [new branch] gh/laithsakka/160/head -> origin/gh/laithsakka/160/head 2025-09-07T06:39:17.5834230Z * [new branch] gh/laithsakka/160/orig -> origin/gh/laithsakka/160/orig 2025-09-07T06:39:17.5834338Z * [new branch] gh/laithsakka/178/base -> origin/gh/laithsakka/178/base 2025-09-07T06:39:17.5834416Z * [new branch] gh/laithsakka/178/head -> origin/gh/laithsakka/178/head 2025-09-07T06:39:17.5834494Z * [new branch] gh/laithsakka/178/orig -> origin/gh/laithsakka/178/orig 2025-09-07T06:39:17.5834571Z * [new branch] gh/laithsakka/191/base -> origin/gh/laithsakka/191/base 2025-09-07T06:39:17.5834648Z * [new branch] gh/laithsakka/191/head -> origin/gh/laithsakka/191/head 2025-09-07T06:39:17.5834725Z * [new branch] gh/laithsakka/191/orig -> origin/gh/laithsakka/191/orig 2025-09-07T06:39:17.5834801Z * [new branch] gh/laithsakka/237/base -> origin/gh/laithsakka/237/base 2025-09-07T06:39:17.5834879Z * [new branch] gh/laithsakka/237/head -> origin/gh/laithsakka/237/head 2025-09-07T06:39:17.5834957Z * [new branch] gh/laithsakka/237/orig -> origin/gh/laithsakka/237/orig 2025-09-07T06:39:17.5835034Z * [new branch] gh/laithsakka/249/base -> origin/gh/laithsakka/249/base 2025-09-07T06:39:17.5835111Z * [new branch] gh/laithsakka/249/head -> origin/gh/laithsakka/249/head 2025-09-07T06:39:17.5835190Z * [new branch] gh/laithsakka/249/orig -> origin/gh/laithsakka/249/orig 2025-09-07T06:39:17.5835266Z * [new branch] gh/laithsakka/251/base -> origin/gh/laithsakka/251/base 2025-09-07T06:39:17.5835344Z * [new branch] gh/laithsakka/251/head -> origin/gh/laithsakka/251/head 2025-09-07T06:39:17.5835421Z * [new branch] gh/laithsakka/251/orig -> origin/gh/laithsakka/251/orig 2025-09-07T06:39:17.5835497Z * [new branch] gh/laithsakka/254/base -> origin/gh/laithsakka/254/base 2025-09-07T06:39:17.5835575Z * [new branch] gh/laithsakka/254/head -> origin/gh/laithsakka/254/head 2025-09-07T06:39:17.5836806Z * [new branch] gh/laithsakka/254/orig -> origin/gh/laithsakka/254/orig 2025-09-07T06:39:17.5836889Z * [new branch] gh/laithsakka/255/base -> origin/gh/laithsakka/255/base 2025-09-07T06:39:17.5836968Z * [new branch] gh/laithsakka/255/head -> origin/gh/laithsakka/255/head 2025-09-07T06:39:17.5837046Z * [new branch] gh/laithsakka/255/orig -> origin/gh/laithsakka/255/orig 2025-09-07T06:39:17.5837123Z * [new branch] gh/laithsakka/256/base -> origin/gh/laithsakka/256/base 2025-09-07T06:39:17.5837201Z * [new branch] gh/laithsakka/256/head -> origin/gh/laithsakka/256/head 2025-09-07T06:39:17.5837278Z * [new branch] gh/laithsakka/256/orig -> origin/gh/laithsakka/256/orig 2025-09-07T06:39:17.5837354Z * [new branch] gh/laithsakka/257/base -> origin/gh/laithsakka/257/base 2025-09-07T06:39:17.5837433Z * [new branch] gh/laithsakka/257/head -> origin/gh/laithsakka/257/head 2025-09-07T06:39:17.5837509Z * [new branch] gh/laithsakka/257/orig -> origin/gh/laithsakka/257/orig 2025-09-07T06:39:17.5837586Z * [new branch] gh/laithsakka/258/base -> origin/gh/laithsakka/258/base 2025-09-07T06:39:17.5837736Z * [new branch] gh/laithsakka/258/head -> origin/gh/laithsakka/258/head 2025-09-07T06:39:17.5837814Z * [new branch] gh/laithsakka/258/orig -> origin/gh/laithsakka/258/orig 2025-09-07T06:39:17.5837891Z * [new branch] gh/laithsakka/259/base -> origin/gh/laithsakka/259/base 2025-09-07T06:39:17.5838079Z * [new branch] gh/laithsakka/259/head -> origin/gh/laithsakka/259/head 2025-09-07T06:39:17.5838157Z * [new branch] gh/laithsakka/259/orig -> origin/gh/laithsakka/259/orig 2025-09-07T06:39:17.5838233Z * [new branch] gh/laithsakka/260/base -> origin/gh/laithsakka/260/base 2025-09-07T06:39:17.5838364Z * [new branch] gh/laithsakka/260/head -> origin/gh/laithsakka/260/head 2025-09-07T06:39:17.5838442Z * [new branch] gh/laithsakka/260/orig -> origin/gh/laithsakka/260/orig 2025-09-07T06:39:17.5838521Z * [new branch] gh/laithsakka/261/base -> origin/gh/laithsakka/261/base 2025-09-07T06:39:17.5838598Z * [new branch] gh/laithsakka/261/head -> origin/gh/laithsakka/261/head 2025-09-07T06:39:17.5838677Z * [new branch] gh/laithsakka/261/orig -> origin/gh/laithsakka/261/orig 2025-09-07T06:39:17.5838754Z * [new branch] gh/laithsakka/262/base -> origin/gh/laithsakka/262/base 2025-09-07T06:39:17.5838830Z * [new branch] gh/laithsakka/262/head -> origin/gh/laithsakka/262/head 2025-09-07T06:39:17.5839993Z * [new branch] gh/laithsakka/262/orig -> origin/gh/laithsakka/262/orig 2025-09-07T06:39:17.5840073Z * [new branch] gh/laithsakka/263/base -> origin/gh/laithsakka/263/base 2025-09-07T06:39:17.5840151Z * [new branch] gh/laithsakka/263/head -> origin/gh/laithsakka/263/head 2025-09-07T06:39:17.5840229Z * [new branch] gh/laithsakka/263/orig -> origin/gh/laithsakka/263/orig 2025-09-07T06:39:17.5840308Z * [new branch] gh/laithsakka/264/base -> origin/gh/laithsakka/264/base 2025-09-07T06:39:17.5840385Z * [new branch] gh/laithsakka/264/head -> origin/gh/laithsakka/264/head 2025-09-07T06:39:17.5840463Z * [new branch] gh/laithsakka/264/orig -> origin/gh/laithsakka/264/orig 2025-09-07T06:39:17.5840539Z * [new branch] gh/laithsakka/265/base -> origin/gh/laithsakka/265/base 2025-09-07T06:39:17.5840615Z * [new branch] gh/laithsakka/265/head -> origin/gh/laithsakka/265/head 2025-09-07T06:39:17.5840692Z * [new branch] gh/laithsakka/265/orig -> origin/gh/laithsakka/265/orig 2025-09-07T06:39:17.5840771Z * [new branch] gh/laithsakka/266/base -> origin/gh/laithsakka/266/base 2025-09-07T06:39:17.5840847Z * [new branch] gh/laithsakka/266/head -> origin/gh/laithsakka/266/head 2025-09-07T06:39:17.5840923Z * [new branch] gh/laithsakka/266/orig -> origin/gh/laithsakka/266/orig 2025-09-07T06:39:17.5841002Z * [new branch] gh/laithsakka/267/base -> origin/gh/laithsakka/267/base 2025-09-07T06:39:17.5841078Z * [new branch] gh/laithsakka/267/head -> origin/gh/laithsakka/267/head 2025-09-07T06:39:17.5841155Z * [new branch] gh/laithsakka/267/orig -> origin/gh/laithsakka/267/orig 2025-09-07T06:39:17.5841232Z * [new branch] gh/laithsakka/268/base -> origin/gh/laithsakka/268/base 2025-09-07T06:39:17.5841309Z * [new branch] gh/laithsakka/268/head -> origin/gh/laithsakka/268/head 2025-09-07T06:39:17.5841387Z * [new branch] gh/laithsakka/268/orig -> origin/gh/laithsakka/268/orig 2025-09-07T06:39:17.5841468Z * [new branch] gh/laithsakka/28/base -> origin/gh/laithsakka/28/base 2025-09-07T06:39:17.5841545Z * [new branch] gh/laithsakka/29/base -> origin/gh/laithsakka/29/base 2025-09-07T06:39:17.5841663Z * [new branch] gh/laithsakka/30/base -> origin/gh/laithsakka/30/base 2025-09-07T06:39:17.5841739Z * [new branch] gh/laithsakka/30/head -> origin/gh/laithsakka/30/head 2025-09-07T06:39:17.5841815Z * [new branch] gh/laithsakka/31/base -> origin/gh/laithsakka/31/base 2025-09-07T06:39:17.5842936Z * [new branch] gh/laithsakka/31/head -> origin/gh/laithsakka/31/head 2025-09-07T06:39:17.5843015Z * [new branch] gh/laithsakka/32/base -> origin/gh/laithsakka/32/base 2025-09-07T06:39:17.5843090Z * [new branch] gh/laithsakka/32/head -> origin/gh/laithsakka/32/head 2025-09-07T06:39:17.5843169Z * [new branch] gh/lucaskabela/1/base -> origin/gh/lucaskabela/1/base 2025-09-07T06:39:17.5843282Z * [new branch] gh/lucaskabela/1/head -> origin/gh/lucaskabela/1/head 2025-09-07T06:39:17.5843361Z * [new branch] gh/lucaskabela/10/base -> origin/gh/lucaskabela/10/base 2025-09-07T06:39:17.5843441Z * [new branch] gh/lucaskabela/10/head -> origin/gh/lucaskabela/10/head 2025-09-07T06:39:17.5843519Z * [new branch] gh/lucaskabela/10/orig -> origin/gh/lucaskabela/10/orig 2025-09-07T06:39:17.5843596Z * [new branch] gh/lucaskabela/11/base -> origin/gh/lucaskabela/11/base 2025-09-07T06:39:17.5843673Z * [new branch] gh/lucaskabela/11/head -> origin/gh/lucaskabela/11/head 2025-09-07T06:39:17.5843752Z * [new branch] gh/lucaskabela/11/orig -> origin/gh/lucaskabela/11/orig 2025-09-07T06:39:17.5843829Z * [new branch] gh/lucaskabela/12/base -> origin/gh/lucaskabela/12/base 2025-09-07T06:39:17.5843907Z * [new branch] gh/lucaskabela/12/head -> origin/gh/lucaskabela/12/head 2025-09-07T06:39:17.5843986Z * [new branch] gh/lucaskabela/12/orig -> origin/gh/lucaskabela/12/orig 2025-09-07T06:39:17.5844064Z * [new branch] gh/lucaskabela/13/base -> origin/gh/lucaskabela/13/base 2025-09-07T06:39:17.5844141Z * [new branch] gh/lucaskabela/13/head -> origin/gh/lucaskabela/13/head 2025-09-07T06:39:17.5844219Z * [new branch] gh/lucaskabela/13/orig -> origin/gh/lucaskabela/13/orig 2025-09-07T06:39:17.5844296Z * [new branch] gh/lucaskabela/14/base -> origin/gh/lucaskabela/14/base 2025-09-07T06:39:17.5844373Z * [new branch] gh/lucaskabela/14/head -> origin/gh/lucaskabela/14/head 2025-09-07T06:39:17.5844451Z * [new branch] gh/lucaskabela/14/orig -> origin/gh/lucaskabela/14/orig 2025-09-07T06:39:17.5844528Z * [new branch] gh/lucaskabela/15/base -> origin/gh/lucaskabela/15/base 2025-09-07T06:39:17.5844607Z * [new branch] gh/lucaskabela/15/head -> origin/gh/lucaskabela/15/head 2025-09-07T06:39:17.5844685Z * [new branch] gh/lucaskabela/15/orig -> origin/gh/lucaskabela/15/orig 2025-09-07T06:39:17.5844763Z * [new branch] gh/lucaskabela/16/base -> origin/gh/lucaskabela/16/base 2025-09-07T06:39:17.5844839Z * [new branch] gh/lucaskabela/16/head -> origin/gh/lucaskabela/16/head 2025-09-07T06:39:17.5845965Z * [new branch] gh/lucaskabela/16/orig -> origin/gh/lucaskabela/16/orig 2025-09-07T06:39:17.5846044Z * [new branch] gh/lucaskabela/17/base -> origin/gh/lucaskabela/17/base 2025-09-07T06:39:17.5846121Z * [new branch] gh/lucaskabela/17/head -> origin/gh/lucaskabela/17/head 2025-09-07T06:39:17.5846199Z * [new branch] gh/lucaskabela/17/orig -> origin/gh/lucaskabela/17/orig 2025-09-07T06:39:17.5846278Z * [new branch] gh/lucaskabela/2/base -> origin/gh/lucaskabela/2/base 2025-09-07T06:39:17.5846355Z * [new branch] gh/lucaskabela/2/head -> origin/gh/lucaskabela/2/head 2025-09-07T06:39:17.5846432Z * [new branch] gh/lucaskabela/2/orig -> origin/gh/lucaskabela/2/orig 2025-09-07T06:39:17.5846666Z * [new branch] gh/lucaskabela/3/base -> origin/gh/lucaskabela/3/base 2025-09-07T06:39:17.5846743Z * [new branch] gh/lucaskabela/3/head -> origin/gh/lucaskabela/3/head 2025-09-07T06:39:17.5846820Z * [new branch] gh/lucaskabela/3/orig -> origin/gh/lucaskabela/3/orig 2025-09-07T06:39:17.5846895Z * [new branch] gh/lucaskabela/4/base -> origin/gh/lucaskabela/4/base 2025-09-07T06:39:17.5846971Z * [new branch] gh/lucaskabela/4/head -> origin/gh/lucaskabela/4/head 2025-09-07T06:39:17.5847048Z * [new branch] gh/lucaskabela/4/orig -> origin/gh/lucaskabela/4/orig 2025-09-07T06:39:17.5847185Z * [new branch] gh/lucaskabela/5/base -> origin/gh/lucaskabela/5/base 2025-09-07T06:39:17.5847261Z * [new branch] gh/lucaskabela/5/head -> origin/gh/lucaskabela/5/head 2025-09-07T06:39:17.5847339Z * [new branch] gh/lucaskabela/5/orig -> origin/gh/lucaskabela/5/orig 2025-09-07T06:39:17.5847416Z * [new branch] gh/lucaskabela/6/base -> origin/gh/lucaskabela/6/base 2025-09-07T06:39:17.5847491Z * [new branch] gh/lucaskabela/6/head -> origin/gh/lucaskabela/6/head 2025-09-07T06:39:17.5847566Z * [new branch] gh/lucaskabela/6/orig -> origin/gh/lucaskabela/6/orig 2025-09-07T06:39:17.5847643Z * [new branch] gh/lucaskabela/7/base -> origin/gh/lucaskabela/7/base 2025-09-07T06:39:17.5847718Z * [new branch] gh/lucaskabela/7/head -> origin/gh/lucaskabela/7/head 2025-09-07T06:39:17.5847795Z * [new branch] gh/lucaskabela/7/orig -> origin/gh/lucaskabela/7/orig 2025-09-07T06:39:17.5847874Z * [new branch] gh/lucaskabela/8/base -> origin/gh/lucaskabela/8/base 2025-09-07T06:39:17.5847949Z * [new branch] gh/lucaskabela/8/head -> origin/gh/lucaskabela/8/head 2025-09-07T06:39:17.5849088Z * [new branch] gh/lucaskabela/8/orig -> origin/gh/lucaskabela/8/orig 2025-09-07T06:39:17.5849168Z * [new branch] gh/lucaskabela/9/base -> origin/gh/lucaskabela/9/base 2025-09-07T06:39:17.5849244Z * [new branch] gh/lucaskabela/9/head -> origin/gh/lucaskabela/9/head 2025-09-07T06:39:17.5849320Z * [new branch] gh/lucaskabela/9/orig -> origin/gh/lucaskabela/9/orig 2025-09-07T06:39:17.5849390Z * [new branch] gh/lw/3/base -> origin/gh/lw/3/base 2025-09-07T06:39:17.5849456Z * [new branch] gh/lw/3/head -> origin/gh/lw/3/head 2025-09-07T06:39:17.5849518Z * [new branch] gh/lw/3/orig -> origin/gh/lw/3/orig 2025-09-07T06:39:17.5849594Z * [new branch] gh/malfet/14/base -> origin/gh/malfet/14/base 2025-09-07T06:39:17.5849669Z * [new branch] gh/malfet/330/base -> origin/gh/malfet/330/base 2025-09-07T06:39:17.5849744Z * [new branch] gh/malfet/330/head -> origin/gh/malfet/330/head 2025-09-07T06:39:17.5849816Z * [new branch] gh/malfet/330/orig -> origin/gh/malfet/330/orig 2025-09-07T06:39:17.5849886Z * [new branch] gh/malfet/396/base -> origin/gh/malfet/396/base 2025-09-07T06:39:17.5849957Z * [new branch] gh/malfet/396/head -> origin/gh/malfet/396/head 2025-09-07T06:39:17.5850027Z * [new branch] gh/malfet/396/orig -> origin/gh/malfet/396/orig 2025-09-07T06:39:17.5850097Z * [new branch] gh/malfet/397/base -> origin/gh/malfet/397/base 2025-09-07T06:39:17.5850167Z * [new branch] gh/malfet/397/head -> origin/gh/malfet/397/head 2025-09-07T06:39:17.5850239Z * [new branch] gh/malfet/397/orig -> origin/gh/malfet/397/orig 2025-09-07T06:39:17.5850309Z * [new branch] gh/malfet/398/base -> origin/gh/malfet/398/base 2025-09-07T06:39:17.5850418Z * [new branch] gh/malfet/398/head -> origin/gh/malfet/398/head 2025-09-07T06:39:17.5850489Z * [new branch] gh/malfet/398/orig -> origin/gh/malfet/398/orig 2025-09-07T06:39:17.5850560Z * [new branch] gh/malfet/399/base -> origin/gh/malfet/399/base 2025-09-07T06:39:17.5850630Z * [new branch] gh/malfet/399/head -> origin/gh/malfet/399/head 2025-09-07T06:39:17.5850700Z * [new branch] gh/malfet/399/orig -> origin/gh/malfet/399/orig 2025-09-07T06:39:17.5850770Z * [new branch] gh/malfet/414/base -> origin/gh/malfet/414/base 2025-09-07T06:39:17.5851920Z * [new branch] gh/malfet/414/head -> origin/gh/malfet/414/head 2025-09-07T06:39:17.5851995Z * [new branch] gh/malfet/414/orig -> origin/gh/malfet/414/orig 2025-09-07T06:39:17.5852065Z * [new branch] gh/malfet/417/base -> origin/gh/malfet/417/base 2025-09-07T06:39:17.5852136Z * [new branch] gh/malfet/417/head -> origin/gh/malfet/417/head 2025-09-07T06:39:17.5852207Z * [new branch] gh/malfet/417/orig -> origin/gh/malfet/417/orig 2025-09-07T06:39:17.5852277Z * [new branch] gh/malfet/418/base -> origin/gh/malfet/418/base 2025-09-07T06:39:17.5852347Z * [new branch] gh/malfet/418/head -> origin/gh/malfet/418/head 2025-09-07T06:39:17.5852417Z * [new branch] gh/malfet/418/orig -> origin/gh/malfet/418/orig 2025-09-07T06:39:17.5852490Z * [new branch] gh/malfet/475/base -> origin/gh/malfet/475/base 2025-09-07T06:39:17.5852561Z * [new branch] gh/malfet/475/head -> origin/gh/malfet/475/head 2025-09-07T06:39:17.5852631Z * [new branch] gh/malfet/475/orig -> origin/gh/malfet/475/orig 2025-09-07T06:39:17.5852703Z * [new branch] gh/malfet/476/base -> origin/gh/malfet/476/base 2025-09-07T06:39:17.5852775Z * [new branch] gh/malfet/476/head -> origin/gh/malfet/476/head 2025-09-07T06:39:17.5852845Z * [new branch] gh/malfet/476/orig -> origin/gh/malfet/476/orig 2025-09-07T06:39:17.5852915Z * [new branch] gh/malfet/477/base -> origin/gh/malfet/477/base 2025-09-07T06:39:17.5852986Z * [new branch] gh/malfet/477/head -> origin/gh/malfet/477/head 2025-09-07T06:39:17.5853055Z * [new branch] gh/malfet/477/orig -> origin/gh/malfet/477/orig 2025-09-07T06:39:17.5853126Z * [new branch] gh/malfet/478/base -> origin/gh/malfet/478/base 2025-09-07T06:39:17.5853197Z * [new branch] gh/malfet/478/head -> origin/gh/malfet/478/head 2025-09-07T06:39:17.5853267Z * [new branch] gh/malfet/478/orig -> origin/gh/malfet/478/orig 2025-09-07T06:39:17.5853338Z * [new branch] gh/malfet/479/base -> origin/gh/malfet/479/base 2025-09-07T06:39:17.5853409Z * [new branch] gh/malfet/479/head -> origin/gh/malfet/479/head 2025-09-07T06:39:17.5853479Z * [new branch] gh/malfet/479/orig -> origin/gh/malfet/479/orig 2025-09-07T06:39:17.5853550Z * [new branch] gh/malfet/480/base -> origin/gh/malfet/480/base 2025-09-07T06:39:17.5854654Z * [new branch] gh/malfet/480/head -> origin/gh/malfet/480/head 2025-09-07T06:39:17.5854727Z * [new branch] gh/malfet/480/orig -> origin/gh/malfet/480/orig 2025-09-07T06:39:17.5854797Z * [new branch] gh/malfet/481/base -> origin/gh/malfet/481/base 2025-09-07T06:39:17.5854869Z * [new branch] gh/malfet/481/head -> origin/gh/malfet/481/head 2025-09-07T06:39:17.5854939Z * [new branch] gh/malfet/481/orig -> origin/gh/malfet/481/orig 2025-09-07T06:39:17.5855010Z * [new branch] gh/malfet/482/base -> origin/gh/malfet/482/base 2025-09-07T06:39:17.5855117Z * [new branch] gh/malfet/482/head -> origin/gh/malfet/482/head 2025-09-07T06:39:17.5855187Z * [new branch] gh/malfet/482/orig -> origin/gh/malfet/482/orig 2025-09-07T06:39:17.5855259Z * [new branch] gh/malfet/483/base -> origin/gh/malfet/483/base 2025-09-07T06:39:17.5855328Z * [new branch] gh/malfet/483/head -> origin/gh/malfet/483/head 2025-09-07T06:39:17.5855398Z * [new branch] gh/malfet/483/orig -> origin/gh/malfet/483/orig 2025-09-07T06:39:17.5855468Z * [new branch] gh/malfet/484/base -> origin/gh/malfet/484/base 2025-09-07T06:39:17.5855569Z * [new branch] gh/malfet/484/head -> origin/gh/malfet/484/head 2025-09-07T06:39:17.5855639Z * [new branch] gh/malfet/484/orig -> origin/gh/malfet/484/orig 2025-09-07T06:39:17.5855710Z * [new branch] gh/malfet/485/base -> origin/gh/malfet/485/base 2025-09-07T06:39:17.5855781Z * [new branch] gh/malfet/485/head -> origin/gh/malfet/485/head 2025-09-07T06:39:17.5855851Z * [new branch] gh/malfet/485/orig -> origin/gh/malfet/485/orig 2025-09-07T06:39:17.5855921Z * [new branch] gh/malfet/486/base -> origin/gh/malfet/486/base 2025-09-07T06:39:17.5855992Z * [new branch] gh/malfet/486/head -> origin/gh/malfet/486/head 2025-09-07T06:39:17.5856062Z * [new branch] gh/malfet/486/orig -> origin/gh/malfet/486/orig 2025-09-07T06:39:17.5856131Z * [new branch] gh/malfet/487/base -> origin/gh/malfet/487/base 2025-09-07T06:39:17.5856203Z * [new branch] gh/malfet/487/head -> origin/gh/malfet/487/head 2025-09-07T06:39:17.5856274Z * [new branch] gh/malfet/487/orig -> origin/gh/malfet/487/orig 2025-09-07T06:39:17.5856345Z * [new branch] gh/malfet/488/base -> origin/gh/malfet/488/base 2025-09-07T06:39:17.5857538Z * [new branch] gh/malfet/488/head -> origin/gh/malfet/488/head 2025-09-07T06:39:17.5857612Z * [new branch] gh/malfet/488/orig -> origin/gh/malfet/488/orig 2025-09-07T06:39:17.5857682Z * [new branch] gh/malfet/489/base -> origin/gh/malfet/489/base 2025-09-07T06:39:17.5857753Z * [new branch] gh/malfet/489/head -> origin/gh/malfet/489/head 2025-09-07T06:39:17.5857823Z * [new branch] gh/malfet/489/orig -> origin/gh/malfet/489/orig 2025-09-07T06:39:17.5857894Z * [new branch] gh/malfet/490/base -> origin/gh/malfet/490/base 2025-09-07T06:39:17.5857967Z * [new branch] gh/malfet/490/head -> origin/gh/malfet/490/head 2025-09-07T06:39:17.5858037Z * [new branch] gh/malfet/490/orig -> origin/gh/malfet/490/orig 2025-09-07T06:39:17.5858109Z * [new branch] gh/malfet/491/base -> origin/gh/malfet/491/base 2025-09-07T06:39:17.5858179Z * [new branch] gh/malfet/491/head -> origin/gh/malfet/491/head 2025-09-07T06:39:17.5858250Z * [new branch] gh/malfet/491/orig -> origin/gh/malfet/491/orig 2025-09-07T06:39:17.5858320Z * [new branch] gh/malfet/492/base -> origin/gh/malfet/492/base 2025-09-07T06:39:17.5858391Z * [new branch] gh/malfet/492/head -> origin/gh/malfet/492/head 2025-09-07T06:39:17.5858461Z * [new branch] gh/malfet/492/orig -> origin/gh/malfet/492/orig 2025-09-07T06:39:17.5858531Z * [new branch] gh/malfet/493/base -> origin/gh/malfet/493/base 2025-09-07T06:39:17.5858602Z * [new branch] gh/malfet/493/head -> origin/gh/malfet/493/head 2025-09-07T06:39:17.5858673Z * [new branch] gh/malfet/493/orig -> origin/gh/malfet/493/orig 2025-09-07T06:39:17.5858810Z * [new branch] gh/malfet/494/base -> origin/gh/malfet/494/base 2025-09-07T06:39:17.5858882Z * [new branch] gh/malfet/494/head -> origin/gh/malfet/494/head 2025-09-07T06:39:17.5858953Z * [new branch] gh/malfet/494/orig -> origin/gh/malfet/494/orig 2025-09-07T06:39:17.5859024Z * [new branch] gh/malfet/495/base -> origin/gh/malfet/495/base 2025-09-07T06:39:17.5859094Z * [new branch] gh/malfet/495/head -> origin/gh/malfet/495/head 2025-09-07T06:39:17.5859165Z * [new branch] gh/malfet/495/orig -> origin/gh/malfet/495/orig 2025-09-07T06:39:17.5859235Z * [new branch] gh/malfet/496/base -> origin/gh/malfet/496/base 2025-09-07T06:39:17.5859368Z * [new branch] gh/malfet/496/head -> origin/gh/malfet/496/head 2025-09-07T06:39:17.5860499Z * [new branch] gh/malfet/496/orig -> origin/gh/malfet/496/orig 2025-09-07T06:39:17.5860572Z * [new branch] gh/malfet/497/base -> origin/gh/malfet/497/base 2025-09-07T06:39:17.5860642Z * [new branch] gh/malfet/497/head -> origin/gh/malfet/497/head 2025-09-07T06:39:17.5860714Z * [new branch] gh/malfet/497/orig -> origin/gh/malfet/497/orig 2025-09-07T06:39:17.5860784Z * [new branch] gh/malfet/498/base -> origin/gh/malfet/498/base 2025-09-07T06:39:17.5860854Z * [new branch] gh/malfet/498/head -> origin/gh/malfet/498/head 2025-09-07T06:39:17.5860925Z * [new branch] gh/malfet/498/orig -> origin/gh/malfet/498/orig 2025-09-07T06:39:17.5860995Z * [new branch] gh/malfet/499/base -> origin/gh/malfet/499/base 2025-09-07T06:39:17.5861066Z * [new branch] gh/malfet/499/head -> origin/gh/malfet/499/head 2025-09-07T06:39:17.5861138Z * [new branch] gh/malfet/499/orig -> origin/gh/malfet/499/orig 2025-09-07T06:39:17.5861209Z * [new branch] gh/malfet/500/base -> origin/gh/malfet/500/base 2025-09-07T06:39:17.5861279Z * [new branch] gh/malfet/500/head -> origin/gh/malfet/500/head 2025-09-07T06:39:17.5861349Z * [new branch] gh/malfet/500/orig -> origin/gh/malfet/500/orig 2025-09-07T06:39:17.5861419Z * [new branch] gh/malfet/501/base -> origin/gh/malfet/501/base 2025-09-07T06:39:17.5861489Z * [new branch] gh/malfet/501/head -> origin/gh/malfet/501/head 2025-09-07T06:39:17.5861561Z * [new branch] gh/malfet/501/orig -> origin/gh/malfet/501/orig 2025-09-07T06:39:17.5861633Z * [new branch] gh/malfet/502/base -> origin/gh/malfet/502/base 2025-09-07T06:39:17.5861703Z * [new branch] gh/malfet/502/head -> origin/gh/malfet/502/head 2025-09-07T06:39:17.5861773Z * [new branch] gh/malfet/502/orig -> origin/gh/malfet/502/orig 2025-09-07T06:39:17.5861846Z * [new branch] gh/malfet/503/base -> origin/gh/malfet/503/base 2025-09-07T06:39:17.5861916Z * [new branch] gh/malfet/503/head -> origin/gh/malfet/503/head 2025-09-07T06:39:17.5861986Z * [new branch] gh/malfet/503/orig -> origin/gh/malfet/503/orig 2025-09-07T06:39:17.5862057Z * [new branch] gh/malfet/504/base -> origin/gh/malfet/504/base 2025-09-07T06:39:17.5862127Z * [new branch] gh/malfet/504/head -> origin/gh/malfet/504/head 2025-09-07T06:39:17.5863229Z * [new branch] gh/malfet/504/orig -> origin/gh/malfet/504/orig 2025-09-07T06:39:17.5863304Z * [new branch] gh/malfet/505/base -> origin/gh/malfet/505/base 2025-09-07T06:39:17.5863373Z * [new branch] gh/malfet/505/head -> origin/gh/malfet/505/head 2025-09-07T06:39:17.5863443Z * [new branch] gh/malfet/505/orig -> origin/gh/malfet/505/orig 2025-09-07T06:39:17.5863641Z * [new branch] gh/malfet/506/base -> origin/gh/malfet/506/base 2025-09-07T06:39:17.5863713Z * [new branch] gh/malfet/506/head -> origin/gh/malfet/506/head 2025-09-07T06:39:17.5863782Z * [new branch] gh/malfet/506/orig -> origin/gh/malfet/506/orig 2025-09-07T06:39:17.5863853Z * [new branch] gh/malfet/507/base -> origin/gh/malfet/507/base 2025-09-07T06:39:17.5863923Z * [new branch] gh/malfet/507/head -> origin/gh/malfet/507/head 2025-09-07T06:39:17.5863992Z * [new branch] gh/malfet/507/orig -> origin/gh/malfet/507/orig 2025-09-07T06:39:17.5864093Z * [new branch] gh/malfet/508/base -> origin/gh/malfet/508/base 2025-09-07T06:39:17.5864163Z * [new branch] gh/malfet/508/head -> origin/gh/malfet/508/head 2025-09-07T06:39:17.5864234Z * [new branch] gh/malfet/508/orig -> origin/gh/malfet/508/orig 2025-09-07T06:39:17.5864306Z * [new branch] gh/malfet/509/base -> origin/gh/malfet/509/base 2025-09-07T06:39:17.5864376Z * [new branch] gh/malfet/509/head -> origin/gh/malfet/509/head 2025-09-07T06:39:17.5864446Z * [new branch] gh/malfet/509/orig -> origin/gh/malfet/509/orig 2025-09-07T06:39:17.5864518Z * [new branch] gh/malfet/510/base -> origin/gh/malfet/510/base 2025-09-07T06:39:17.5864588Z * [new branch] gh/malfet/510/head -> origin/gh/malfet/510/head 2025-09-07T06:39:17.5864658Z * [new branch] gh/malfet/510/orig -> origin/gh/malfet/510/orig 2025-09-07T06:39:17.5864730Z * [new branch] gh/malfet/511/base -> origin/gh/malfet/511/base 2025-09-07T06:39:17.5864800Z * [new branch] gh/malfet/511/head -> origin/gh/malfet/511/head 2025-09-07T06:39:17.5864870Z * [new branch] gh/malfet/511/orig -> origin/gh/malfet/511/orig 2025-09-07T06:39:17.5864942Z * [new branch] gh/malfet/512/base -> origin/gh/malfet/512/base 2025-09-07T06:39:17.5865012Z * [new branch] gh/malfet/512/head -> origin/gh/malfet/512/head 2025-09-07T06:39:17.5866133Z * [new branch] gh/malfet/512/orig -> origin/gh/malfet/512/orig 2025-09-07T06:39:17.5866207Z * [new branch] gh/malfet/513/base -> origin/gh/malfet/513/base 2025-09-07T06:39:17.5866277Z * [new branch] gh/malfet/513/head -> origin/gh/malfet/513/head 2025-09-07T06:39:17.5866347Z * [new branch] gh/malfet/513/orig -> origin/gh/malfet/513/orig 2025-09-07T06:39:17.5866421Z * [new branch] gh/malfet/64/base -> origin/gh/malfet/64/base 2025-09-07T06:39:17.5866556Z * [new branch] gh/malfet/64/head -> origin/gh/malfet/64/head 2025-09-07T06:39:17.5866652Z * [new branch] gh/manuelcandales/10/base -> origin/gh/manuelcandales/10/base 2025-09-07T06:39:17.5866738Z * [new branch] gh/manuelcandales/10/head -> origin/gh/manuelcandales/10/head 2025-09-07T06:39:17.5866824Z * [new branch] gh/manuelcandales/10/orig -> origin/gh/manuelcandales/10/orig 2025-09-07T06:39:17.5866907Z * [new branch] gh/manuelcandales/11/base -> origin/gh/manuelcandales/11/base 2025-09-07T06:39:17.5866991Z * [new branch] gh/manuelcandales/11/head -> origin/gh/manuelcandales/11/head 2025-09-07T06:39:17.5867075Z * [new branch] gh/manuelcandales/11/orig -> origin/gh/manuelcandales/11/orig 2025-09-07T06:39:17.5867162Z * [new branch] gh/manuelcandales/9/base -> origin/gh/manuelcandales/9/base 2025-09-07T06:39:17.5867247Z * [new branch] gh/manuelcandales/9/head -> origin/gh/manuelcandales/9/head 2025-09-07T06:39:17.5867330Z * [new branch] gh/manuelcandales/9/orig -> origin/gh/manuelcandales/9/orig 2025-09-07T06:39:17.5867477Z * [new branch] gh/markkm/1/base -> origin/gh/markkm/1/base 2025-09-07T06:39:17.5867557Z * [new branch] gh/masnesral/204/base -> origin/gh/masnesral/204/base 2025-09-07T06:39:17.5867635Z * [new branch] gh/masnesral/204/head -> origin/gh/masnesral/204/head 2025-09-07T06:39:17.5867711Z * [new branch] gh/masnesral/204/orig -> origin/gh/masnesral/204/orig 2025-09-07T06:39:17.5867786Z * [new branch] gh/masnesral/235/base -> origin/gh/masnesral/235/base 2025-09-07T06:39:17.5867862Z * [new branch] gh/masnesral/235/head -> origin/gh/masnesral/235/head 2025-09-07T06:39:17.5867993Z * [new branch] gh/masnesral/235/orig -> origin/gh/masnesral/235/orig 2025-09-07T06:39:17.5868069Z * [new branch] gh/masnesral/34/base -> origin/gh/masnesral/34/base 2025-09-07T06:39:17.5868147Z * [new branch] gh/mhorowitz/0/base -> origin/gh/mhorowitz/0/base 2025-09-07T06:39:17.5869284Z * [new branch] gh/mhorowitz/0/head -> origin/gh/mhorowitz/0/head 2025-09-07T06:39:17.5869361Z * [new branch] gh/mhorowitz/1/base -> origin/gh/mhorowitz/1/base 2025-09-07T06:39:17.5869436Z * [new branch] gh/mhorowitz/1/head -> origin/gh/mhorowitz/1/head 2025-09-07T06:39:17.5869509Z * [new branch] gh/mhorowitz/2/base -> origin/gh/mhorowitz/2/base 2025-09-07T06:39:17.5869588Z * [new branch] gh/mhorowitz/2/head -> origin/gh/mhorowitz/2/head 2025-09-07T06:39:17.5869661Z * [new branch] gh/mhorowitz/3/base -> origin/gh/mhorowitz/3/base 2025-09-07T06:39:17.5869735Z * [new branch] gh/mhorowitz/3/head -> origin/gh/mhorowitz/3/head 2025-09-07T06:39:17.5869807Z * [new branch] gh/mhorowitz/4/base -> origin/gh/mhorowitz/4/base 2025-09-07T06:39:17.5869882Z * [new branch] gh/mhorowitz/4/head -> origin/gh/mhorowitz/4/head 2025-09-07T06:39:17.5869954Z * [new branch] gh/mhorowitz/5/base -> origin/gh/mhorowitz/5/base 2025-09-07T06:39:17.5870027Z * [new branch] gh/mhorowitz/5/head -> origin/gh/mhorowitz/5/head 2025-09-07T06:39:17.5870100Z * [new branch] gh/mhorowitz/6/base -> origin/gh/mhorowitz/6/base 2025-09-07T06:39:17.5870172Z * [new branch] gh/mhorowitz/6/head -> origin/gh/mhorowitz/6/head 2025-09-07T06:39:17.5870274Z * [new branch] gh/mikaylagawarecki/234/base -> origin/gh/mikaylagawarecki/234/base 2025-09-07T06:39:17.5870369Z * [new branch] gh/mikaylagawarecki/234/head -> origin/gh/mikaylagawarecki/234/head 2025-09-07T06:39:17.5870464Z * [new branch] gh/mikaylagawarecki/235/base -> origin/gh/mikaylagawarecki/235/base 2025-09-07T06:39:17.5870555Z * [new branch] gh/mikaylagawarecki/235/head -> origin/gh/mikaylagawarecki/235/head 2025-09-07T06:39:17.5870647Z * [new branch] gh/mikaylagawarecki/236/base -> origin/gh/mikaylagawarecki/236/base 2025-09-07T06:39:17.5870738Z * [new branch] gh/mikaylagawarecki/236/head -> origin/gh/mikaylagawarecki/236/head 2025-09-07T06:39:17.5870828Z * [new branch] gh/mikaylagawarecki/237/base -> origin/gh/mikaylagawarecki/237/base 2025-09-07T06:39:17.5870917Z * [new branch] gh/mikaylagawarecki/237/head -> origin/gh/mikaylagawarecki/237/head 2025-09-07T06:39:17.5871008Z * [new branch] gh/mikaylagawarecki/238/base -> origin/gh/mikaylagawarecki/238/base 2025-09-07T06:39:17.5871097Z * [new branch] gh/mikaylagawarecki/238/head -> origin/gh/mikaylagawarecki/238/head 2025-09-07T06:39:17.5871188Z * [new branch] gh/mikaylagawarecki/317/base -> origin/gh/mikaylagawarecki/317/base 2025-09-07T06:39:17.5872315Z * [new branch] gh/mikaylagawarecki/317/head -> origin/gh/mikaylagawarecki/317/head 2025-09-07T06:39:17.5872444Z * [new branch] gh/mikaylagawarecki/317/orig -> origin/gh/mikaylagawarecki/317/orig 2025-09-07T06:39:17.5872535Z * [new branch] gh/mikaylagawarecki/320/base -> origin/gh/mikaylagawarecki/320/base 2025-09-07T06:39:17.5872626Z * [new branch] gh/mikaylagawarecki/320/head -> origin/gh/mikaylagawarecki/320/head 2025-09-07T06:39:17.5872716Z * [new branch] gh/mikaylagawarecki/320/orig -> origin/gh/mikaylagawarecki/320/orig 2025-09-07T06:39:17.5872805Z * [new branch] gh/mikaylagawarecki/329/base -> origin/gh/mikaylagawarecki/329/base 2025-09-07T06:39:17.5872896Z * [new branch] gh/mikaylagawarecki/329/head -> origin/gh/mikaylagawarecki/329/head 2025-09-07T06:39:17.5873015Z * [new branch] gh/mikaylagawarecki/329/orig -> origin/gh/mikaylagawarecki/329/orig 2025-09-07T06:39:17.5873104Z * [new branch] gh/mikaylagawarecki/330/base -> origin/gh/mikaylagawarecki/330/base 2025-09-07T06:39:17.5873195Z * [new branch] gh/mikaylagawarecki/330/head -> origin/gh/mikaylagawarecki/330/head 2025-09-07T06:39:17.5873285Z * [new branch] gh/mikaylagawarecki/330/orig -> origin/gh/mikaylagawarecki/330/orig 2025-09-07T06:39:17.5873374Z * [new branch] gh/mikaylagawarecki/331/base -> origin/gh/mikaylagawarecki/331/base 2025-09-07T06:39:17.5873465Z * [new branch] gh/mikaylagawarecki/331/head -> origin/gh/mikaylagawarecki/331/head 2025-09-07T06:39:17.5873555Z * [new branch] gh/mikaylagawarecki/331/orig -> origin/gh/mikaylagawarecki/331/orig 2025-09-07T06:39:17.5873644Z * [new branch] gh/mikaylagawarecki/332/base -> origin/gh/mikaylagawarecki/332/base 2025-09-07T06:39:17.5873736Z * [new branch] gh/mikaylagawarecki/332/head -> origin/gh/mikaylagawarecki/332/head 2025-09-07T06:39:17.5873826Z * [new branch] gh/mikaylagawarecki/332/orig -> origin/gh/mikaylagawarecki/332/orig 2025-09-07T06:39:17.5873917Z * [new branch] gh/mikaylagawarecki/334/base -> origin/gh/mikaylagawarecki/334/base 2025-09-07T06:39:17.5874008Z * [new branch] gh/mikaylagawarecki/334/head -> origin/gh/mikaylagawarecki/334/head 2025-09-07T06:39:17.5874097Z * [new branch] gh/mikaylagawarecki/334/orig -> origin/gh/mikaylagawarecki/334/orig 2025-09-07T06:39:17.5874186Z * [new branch] gh/mikaylagawarecki/335/base -> origin/gh/mikaylagawarecki/335/base 2025-09-07T06:39:17.5874277Z * [new branch] gh/mikaylagawarecki/335/head -> origin/gh/mikaylagawarecki/335/head 2025-09-07T06:39:17.5874366Z * [new branch] gh/mikaylagawarecki/335/orig -> origin/gh/mikaylagawarecki/335/orig 2025-09-07T06:39:17.5874456Z * [new branch] gh/mikaylagawarecki/336/base -> origin/gh/mikaylagawarecki/336/base 2025-09-07T06:39:17.5875595Z * [new branch] gh/mikaylagawarecki/336/head -> origin/gh/mikaylagawarecki/336/head 2025-09-07T06:39:17.5875690Z * [new branch] gh/mikaylagawarecki/336/orig -> origin/gh/mikaylagawarecki/336/orig 2025-09-07T06:39:17.5875780Z * [new branch] gh/mikaylagawarecki/337/base -> origin/gh/mikaylagawarecki/337/base 2025-09-07T06:39:17.5875871Z * [new branch] gh/mikaylagawarecki/337/head -> origin/gh/mikaylagawarecki/337/head 2025-09-07T06:39:17.5875961Z * [new branch] gh/mikaylagawarecki/337/orig -> origin/gh/mikaylagawarecki/337/orig 2025-09-07T06:39:17.5876051Z * [new branch] gh/mikaylagawarecki/338/base -> origin/gh/mikaylagawarecki/338/base 2025-09-07T06:39:17.5876141Z * [new branch] gh/mikaylagawarecki/338/head -> origin/gh/mikaylagawarecki/338/head 2025-09-07T06:39:17.5876231Z * [new branch] gh/mikaylagawarecki/338/orig -> origin/gh/mikaylagawarecki/338/orig 2025-09-07T06:39:17.5876321Z * [new branch] gh/mikaylagawarecki/339/base -> origin/gh/mikaylagawarecki/339/base 2025-09-07T06:39:17.5876411Z * [new branch] gh/mikaylagawarecki/339/head -> origin/gh/mikaylagawarecki/339/head 2025-09-07T06:39:17.5876639Z * [new branch] gh/mikaylagawarecki/339/orig -> origin/gh/mikaylagawarecki/339/orig 2025-09-07T06:39:17.5876714Z * [new branch] gh/mlazos/1/base -> origin/gh/mlazos/1/base 2025-09-07T06:39:17.5876787Z * [new branch] gh/mlazos/1/head -> origin/gh/mlazos/1/head 2025-09-07T06:39:17.5876856Z * [new branch] gh/mlazos/1/orig -> origin/gh/mlazos/1/orig 2025-09-07T06:39:17.5876929Z * [new branch] gh/mlazos/12/base -> origin/gh/mlazos/12/base 2025-09-07T06:39:17.5876999Z * [new branch] gh/mlazos/12/head -> origin/gh/mlazos/12/head 2025-09-07T06:39:17.5877134Z * [new branch] gh/mlazos/12/orig -> origin/gh/mlazos/12/orig 2025-09-07T06:39:17.5877203Z * [new branch] gh/mlazos/13/base -> origin/gh/mlazos/13/base 2025-09-07T06:39:17.5877273Z * [new branch] gh/mlazos/13/head -> origin/gh/mlazos/13/head 2025-09-07T06:39:17.5877343Z * [new branch] gh/mlazos/13/orig -> origin/gh/mlazos/13/orig 2025-09-07T06:39:17.5877412Z * [new branch] gh/mlazos/14/base -> origin/gh/mlazos/14/base 2025-09-07T06:39:17.5877480Z * [new branch] gh/mlazos/14/head -> origin/gh/mlazos/14/head 2025-09-07T06:39:17.5877548Z * [new branch] gh/mlazos/14/orig -> origin/gh/mlazos/14/orig 2025-09-07T06:39:17.5877617Z * [new branch] gh/mlazos/15/base -> origin/gh/mlazos/15/base 2025-09-07T06:39:17.5877686Z * [new branch] gh/mlazos/15/head -> origin/gh/mlazos/15/head 2025-09-07T06:39:17.5878961Z * [new branch] gh/mlazos/15/orig -> origin/gh/mlazos/15/orig 2025-09-07T06:39:17.5879031Z * [new branch] gh/mlazos/16/base -> origin/gh/mlazos/16/base 2025-09-07T06:39:17.5879100Z * [new branch] gh/mlazos/16/head -> origin/gh/mlazos/16/head 2025-09-07T06:39:17.5879170Z * [new branch] gh/mlazos/16/orig -> origin/gh/mlazos/16/orig 2025-09-07T06:39:17.5879238Z * [new branch] gh/mlazos/17/base -> origin/gh/mlazos/17/base 2025-09-07T06:39:17.5879307Z * [new branch] gh/mlazos/17/head -> origin/gh/mlazos/17/head 2025-09-07T06:39:17.5879376Z * [new branch] gh/mlazos/17/orig -> origin/gh/mlazos/17/orig 2025-09-07T06:39:17.5879445Z * [new branch] gh/mlazos/2/base -> origin/gh/mlazos/2/base 2025-09-07T06:39:17.5879513Z * [new branch] gh/mlazos/2/head -> origin/gh/mlazos/2/head 2025-09-07T06:39:17.5879583Z * [new branch] gh/mlazos/2/orig -> origin/gh/mlazos/2/orig 2025-09-07T06:39:17.5879650Z * [new branch] gh/mlazos/3/base -> origin/gh/mlazos/3/base 2025-09-07T06:39:17.5879718Z * [new branch] gh/mlazos/3/head -> origin/gh/mlazos/3/head 2025-09-07T06:39:17.5879786Z * [new branch] gh/mlazos/3/orig -> origin/gh/mlazos/3/orig 2025-09-07T06:39:17.5879858Z * [new branch] gh/mrmiywj/1/base -> origin/gh/mrmiywj/1/base 2025-09-07T06:39:17.5879929Z * [new branch] gh/mrmiywj/1/head -> origin/gh/mrmiywj/1/head 2025-09-07T06:39:17.5880007Z * [new branch] gh/muchulee8/62/base -> origin/gh/muchulee8/62/base 2025-09-07T06:39:17.5880084Z * [new branch] gh/muchulee8/62/head -> origin/gh/muchulee8/62/head 2025-09-07T06:39:17.5880158Z * [new branch] gh/muchulee8/62/orig -> origin/gh/muchulee8/62/orig 2025-09-07T06:39:17.5880234Z * [new branch] gh/muchulee8/63/base -> origin/gh/muchulee8/63/base 2025-09-07T06:39:17.5880308Z * [new branch] gh/muchulee8/63/head -> origin/gh/muchulee8/63/head 2025-09-07T06:39:17.5880426Z * [new branch] gh/muchulee8/63/orig -> origin/gh/muchulee8/63/orig 2025-09-07T06:39:17.5880500Z * [new branch] gh/muchulee8/64/base -> origin/gh/muchulee8/64/base 2025-09-07T06:39:17.5880573Z * [new branch] gh/muchulee8/64/head -> origin/gh/muchulee8/64/head 2025-09-07T06:39:17.5880646Z * [new branch] gh/muchulee8/64/orig -> origin/gh/muchulee8/64/orig 2025-09-07T06:39:17.5881768Z * [new branch] gh/muchulee8/65/base -> origin/gh/muchulee8/65/base 2025-09-07T06:39:17.5881846Z * [new branch] gh/muchulee8/65/head -> origin/gh/muchulee8/65/head 2025-09-07T06:39:17.5881918Z * [new branch] gh/muchulee8/65/orig -> origin/gh/muchulee8/65/orig 2025-09-07T06:39:17.5882041Z * [new branch] gh/naveenthangudu/1/base -> origin/gh/naveenthangudu/1/base 2025-09-07T06:39:17.5882127Z * [new branch] gh/naveenthangudu/1/head -> origin/gh/naveenthangudu/1/head 2025-09-07T06:39:17.5882212Z * [new branch] gh/naveenthangudu/1/orig -> origin/gh/naveenthangudu/1/orig 2025-09-07T06:39:17.5882294Z * [new branch] gh/naveenthangudu/2/base -> origin/gh/naveenthangudu/2/base 2025-09-07T06:39:17.5882377Z * [new branch] gh/naveenthangudu/2/head -> origin/gh/naveenthangudu/2/head 2025-09-07T06:39:17.5882457Z * [new branch] gh/naveenthangudu/2/orig -> origin/gh/naveenthangudu/2/orig 2025-09-07T06:39:17.5882538Z * [new branch] gh/naveenthangudu/3/base -> origin/gh/naveenthangudu/3/base 2025-09-07T06:39:17.5882621Z * [new branch] gh/naveenthangudu/3/head -> origin/gh/naveenthangudu/3/head 2025-09-07T06:39:17.5882704Z * [new branch] gh/naveenthangudu/3/orig -> origin/gh/naveenthangudu/3/orig 2025-09-07T06:39:17.5882786Z * [new branch] gh/naveenthangudu/4/base -> origin/gh/naveenthangudu/4/base 2025-09-07T06:39:17.5882869Z * [new branch] gh/naveenthangudu/4/head -> origin/gh/naveenthangudu/4/head 2025-09-07T06:39:17.5882951Z * [new branch] gh/naveenthangudu/4/orig -> origin/gh/naveenthangudu/4/orig 2025-09-07T06:39:17.5883032Z * [new branch] gh/naveenthangudu/5/base -> origin/gh/naveenthangudu/5/base 2025-09-07T06:39:17.5883114Z * [new branch] gh/naveenthangudu/5/head -> origin/gh/naveenthangudu/5/head 2025-09-07T06:39:17.5883196Z * [new branch] gh/naveenthangudu/5/orig -> origin/gh/naveenthangudu/5/orig 2025-09-07T06:39:17.5883277Z * [new branch] gh/naveenthangudu/6/base -> origin/gh/naveenthangudu/6/base 2025-09-07T06:39:17.5883359Z * [new branch] gh/naveenthangudu/6/head -> origin/gh/naveenthangudu/6/head 2025-09-07T06:39:17.5883442Z * [new branch] gh/naveenthangudu/6/orig -> origin/gh/naveenthangudu/6/orig 2025-09-07T06:39:17.5883515Z * [new branch] gh/oulgen/35/base -> origin/gh/oulgen/35/base 2025-09-07T06:39:17.5883587Z * [new branch] gh/oulgen/35/head -> origin/gh/oulgen/35/head 2025-09-07T06:39:17.5883656Z * [new branch] gh/oulgen/35/orig -> origin/gh/oulgen/35/orig 2025-09-07T06:39:17.5884780Z * [new branch] gh/oulgen/48/base -> origin/gh/oulgen/48/base 2025-09-07T06:39:17.5884854Z * [new branch] gh/oulgen/48/head -> origin/gh/oulgen/48/head 2025-09-07T06:39:17.5884922Z * [new branch] gh/oulgen/48/orig -> origin/gh/oulgen/48/orig 2025-09-07T06:39:17.5884991Z * [new branch] gh/oulgen/49/base -> origin/gh/oulgen/49/base 2025-09-07T06:39:17.5885060Z * [new branch] gh/oulgen/49/head -> origin/gh/oulgen/49/head 2025-09-07T06:39:17.5885130Z * [new branch] gh/oulgen/49/orig -> origin/gh/oulgen/49/orig 2025-09-07T06:39:17.5885203Z * [new branch] gh/pearu/108/base -> origin/gh/pearu/108/base 2025-09-07T06:39:17.5885308Z * [new branch] gh/pearu/108/head -> origin/gh/pearu/108/head 2025-09-07T06:39:17.5885379Z * [new branch] gh/pearu/108/orig -> origin/gh/pearu/108/orig 2025-09-07T06:39:17.5885448Z * [new branch] gh/pearu/109/base -> origin/gh/pearu/109/base 2025-09-07T06:39:17.5885516Z * [new branch] gh/pearu/109/head -> origin/gh/pearu/109/head 2025-09-07T06:39:17.5885585Z * [new branch] gh/pearu/109/orig -> origin/gh/pearu/109/orig 2025-09-07T06:39:17.5885654Z * [new branch] gh/pearu/110/base -> origin/gh/pearu/110/base 2025-09-07T06:39:17.5885722Z * [new branch] gh/pearu/110/head -> origin/gh/pearu/110/head 2025-09-07T06:39:17.5885820Z * [new branch] gh/pearu/110/orig -> origin/gh/pearu/110/orig 2025-09-07T06:39:17.5885889Z * [new branch] gh/pearu/111/base -> origin/gh/pearu/111/base 2025-09-07T06:39:17.5885959Z * [new branch] gh/pearu/111/head -> origin/gh/pearu/111/head 2025-09-07T06:39:17.5886028Z * [new branch] gh/pearu/111/orig -> origin/gh/pearu/111/orig 2025-09-07T06:39:17.5886097Z * [new branch] gh/pearu/112/base -> origin/gh/pearu/112/base 2025-09-07T06:39:17.5886165Z * [new branch] gh/pearu/112/head -> origin/gh/pearu/112/head 2025-09-07T06:39:17.5886235Z * [new branch] gh/pearu/112/orig -> origin/gh/pearu/112/orig 2025-09-07T06:39:17.5886303Z * [new branch] gh/pearu/113/base -> origin/gh/pearu/113/base 2025-09-07T06:39:17.5886372Z * [new branch] gh/pearu/113/head -> origin/gh/pearu/113/head 2025-09-07T06:39:17.5886442Z * [new branch] gh/pearu/113/orig -> origin/gh/pearu/113/orig 2025-09-07T06:39:17.5887639Z * [new branch] gh/pearu/114/base -> origin/gh/pearu/114/base 2025-09-07T06:39:17.5887714Z * [new branch] gh/pearu/114/head -> origin/gh/pearu/114/head 2025-09-07T06:39:17.5887784Z * [new branch] gh/pearu/114/orig -> origin/gh/pearu/114/orig 2025-09-07T06:39:17.5887852Z * [new branch] gh/pearu/115/base -> origin/gh/pearu/115/base 2025-09-07T06:39:17.5887920Z * [new branch] gh/pearu/115/head -> origin/gh/pearu/115/head 2025-09-07T06:39:17.5887990Z * [new branch] gh/pearu/115/orig -> origin/gh/pearu/115/orig 2025-09-07T06:39:17.5888058Z * [new branch] gh/pearu/116/base -> origin/gh/pearu/116/base 2025-09-07T06:39:17.5888128Z * [new branch] gh/pearu/116/head -> origin/gh/pearu/116/head 2025-09-07T06:39:17.5888196Z * [new branch] gh/pearu/116/orig -> origin/gh/pearu/116/orig 2025-09-07T06:39:17.5888266Z * [new branch] gh/pearu/117/base -> origin/gh/pearu/117/base 2025-09-07T06:39:17.5888335Z * [new branch] gh/pearu/117/head -> origin/gh/pearu/117/head 2025-09-07T06:39:17.5888404Z * [new branch] gh/pearu/117/orig -> origin/gh/pearu/117/orig 2025-09-07T06:39:17.5888474Z * [new branch] gh/pearu/56/base -> origin/gh/pearu/56/base 2025-09-07T06:39:17.5888543Z * [new branch] gh/pearu/56/head -> origin/gh/pearu/56/head 2025-09-07T06:39:17.5888611Z * [new branch] gh/pearu/56/orig -> origin/gh/pearu/56/orig 2025-09-07T06:39:17.5888680Z * [new branch] gh/pearu/97/base -> origin/gh/pearu/97/base 2025-09-07T06:39:17.5888747Z * [new branch] gh/pearu/97/head -> origin/gh/pearu/97/head 2025-09-07T06:39:17.5888815Z * [new branch] gh/pearu/97/orig -> origin/gh/pearu/97/orig 2025-09-07T06:39:17.5888886Z * [new branch] gh/qqaatw/29/base -> origin/gh/qqaatw/29/base 2025-09-07T06:39:17.5889018Z * [new branch] gh/qqaatw/29/head -> origin/gh/qqaatw/29/head 2025-09-07T06:39:17.5889088Z * [new branch] gh/qqaatw/29/orig -> origin/gh/qqaatw/29/orig 2025-09-07T06:39:17.5889178Z * [new branch] gh/raymo/refresh-script -> origin/gh/raymo/refresh-script 2025-09-07T06:39:17.5889248Z * [new branch] gh/rec/141/base -> origin/gh/rec/141/base 2025-09-07T06:39:17.5889315Z * [new branch] gh/rec/141/head -> origin/gh/rec/141/head 2025-09-07T06:39:17.5890433Z * [new branch] gh/rec/153/base -> origin/gh/rec/153/base 2025-09-07T06:39:17.5890502Z * [new branch] gh/rec/153/head -> origin/gh/rec/153/head 2025-09-07T06:39:17.5890636Z * [new branch] gh/rec/153/orig -> origin/gh/rec/153/orig 2025-09-07T06:39:17.5890701Z * [new branch] gh/rec/154/base -> origin/gh/rec/154/base 2025-09-07T06:39:17.5890767Z * [new branch] gh/rec/154/head -> origin/gh/rec/154/head 2025-09-07T06:39:17.5890832Z * [new branch] gh/rec/154/orig -> origin/gh/rec/154/orig 2025-09-07T06:39:17.5890898Z * [new branch] gh/rec/156/base -> origin/gh/rec/156/base 2025-09-07T06:39:17.5890963Z * [new branch] gh/rec/156/head -> origin/gh/rec/156/head 2025-09-07T06:39:17.5891027Z * [new branch] gh/rec/156/orig -> origin/gh/rec/156/orig 2025-09-07T06:39:17.5891093Z * [new branch] gh/rec/160/base -> origin/gh/rec/160/base 2025-09-07T06:39:17.5891157Z * [new branch] gh/rec/160/head -> origin/gh/rec/160/head 2025-09-07T06:39:17.5891224Z * [new branch] gh/rec/160/orig -> origin/gh/rec/160/orig 2025-09-07T06:39:17.5891288Z * [new branch] gh/rec/162/base -> origin/gh/rec/162/base 2025-09-07T06:39:17.5891356Z * [new branch] gh/rec/162/head -> origin/gh/rec/162/head 2025-09-07T06:39:17.5891420Z * [new branch] gh/rec/162/orig -> origin/gh/rec/162/orig 2025-09-07T06:39:17.5891485Z * [new branch] gh/rec/163/base -> origin/gh/rec/163/base 2025-09-07T06:39:17.5891551Z * [new branch] gh/rec/163/head -> origin/gh/rec/163/head 2025-09-07T06:39:17.5891616Z * [new branch] gh/rec/163/orig -> origin/gh/rec/163/orig 2025-09-07T06:39:17.5891680Z * [new branch] gh/rec/164/base -> origin/gh/rec/164/base 2025-09-07T06:39:17.5891746Z * [new branch] gh/rec/164/head -> origin/gh/rec/164/head 2025-09-07T06:39:17.5891812Z * [new branch] gh/rec/164/orig -> origin/gh/rec/164/orig 2025-09-07T06:39:17.5891877Z * [new branch] gh/rec/165/base -> origin/gh/rec/165/base 2025-09-07T06:39:17.5891943Z * [new branch] gh/rec/165/head -> origin/gh/rec/165/head 2025-09-07T06:39:17.5892009Z * [new branch] gh/rec/165/orig -> origin/gh/rec/165/orig 2025-09-07T06:39:17.5892072Z * [new branch] gh/rec/166/base -> origin/gh/rec/166/base 2025-09-07T06:39:17.5893183Z * [new branch] gh/rec/166/head -> origin/gh/rec/166/head 2025-09-07T06:39:17.5893249Z * [new branch] gh/rec/166/orig -> origin/gh/rec/166/orig 2025-09-07T06:39:17.5893340Z * [new branch] gh/robert-hardwick/1/base -> origin/gh/robert-hardwick/1/base 2025-09-07T06:39:17.5893429Z * [new branch] gh/robert-hardwick/1/head -> origin/gh/robert-hardwick/1/head 2025-09-07T06:39:17.5893516Z * [new branch] gh/robert-hardwick/1/orig -> origin/gh/robert-hardwick/1/orig 2025-09-07T06:39:17.5893600Z * [new branch] gh/robert-hardwick/2/base -> origin/gh/robert-hardwick/2/base 2025-09-07T06:39:17.5893722Z * [new branch] gh/robert-hardwick/2/head -> origin/gh/robert-hardwick/2/head 2025-09-07T06:39:17.5893805Z * [new branch] gh/robert-hardwick/2/orig -> origin/gh/robert-hardwick/2/orig 2025-09-07T06:39:17.5893888Z * [new branch] gh/robert-hardwick/3/base -> origin/gh/robert-hardwick/3/base 2025-09-07T06:39:17.5893971Z * [new branch] gh/robert-hardwick/3/head -> origin/gh/robert-hardwick/3/head 2025-09-07T06:39:17.5894054Z * [new branch] gh/robert-hardwick/3/orig -> origin/gh/robert-hardwick/3/orig 2025-09-07T06:39:17.5894136Z * [new branch] gh/robert-hardwick/4/base -> origin/gh/robert-hardwick/4/base 2025-09-07T06:39:17.5894264Z * [new branch] gh/robert-hardwick/4/head -> origin/gh/robert-hardwick/4/head 2025-09-07T06:39:17.5894346Z * [new branch] gh/robert-hardwick/4/orig -> origin/gh/robert-hardwick/4/orig 2025-09-07T06:39:17.5894419Z * [new branch] gh/rtimpe/1/base -> origin/gh/rtimpe/1/base 2025-09-07T06:39:17.5894490Z * [new branch] gh/rtimpe/1/head -> origin/gh/rtimpe/1/head 2025-09-07T06:39:17.5894562Z * [new branch] gh/rtimpe/10/base -> origin/gh/rtimpe/10/base 2025-09-07T06:39:17.5894633Z * [new branch] gh/rtimpe/10/head -> origin/gh/rtimpe/10/head 2025-09-07T06:39:17.5894702Z * [new branch] gh/rtimpe/10/orig -> origin/gh/rtimpe/10/orig 2025-09-07T06:39:17.5894773Z * [new branch] gh/rtimpe/11/base -> origin/gh/rtimpe/11/base 2025-09-07T06:39:17.5894842Z * [new branch] gh/rtimpe/11/head -> origin/gh/rtimpe/11/head 2025-09-07T06:39:17.5894912Z * [new branch] gh/rtimpe/11/orig -> origin/gh/rtimpe/11/orig 2025-09-07T06:39:17.5894982Z * [new branch] gh/rtimpe/12/base -> origin/gh/rtimpe/12/base 2025-09-07T06:39:17.5895051Z * [new branch] gh/rtimpe/12/head -> origin/gh/rtimpe/12/head 2025-09-07T06:39:17.5896165Z * [new branch] gh/rtimpe/12/orig -> origin/gh/rtimpe/12/orig 2025-09-07T06:39:17.5896237Z * [new branch] gh/rtimpe/13/base -> origin/gh/rtimpe/13/base 2025-09-07T06:39:17.5896307Z * [new branch] gh/rtimpe/13/head -> origin/gh/rtimpe/13/head 2025-09-07T06:39:17.5896375Z * [new branch] gh/rtimpe/13/orig -> origin/gh/rtimpe/13/orig 2025-09-07T06:39:17.5896445Z * [new branch] gh/rtimpe/14/base -> origin/gh/rtimpe/14/base 2025-09-07T06:39:17.5896691Z * [new branch] gh/rtimpe/14/head -> origin/gh/rtimpe/14/head 2025-09-07T06:39:17.5896763Z * [new branch] gh/rtimpe/14/orig -> origin/gh/rtimpe/14/orig 2025-09-07T06:39:17.5896832Z * [new branch] gh/rtimpe/15/base -> origin/gh/rtimpe/15/base 2025-09-07T06:39:17.5896902Z * [new branch] gh/rtimpe/15/head -> origin/gh/rtimpe/15/head 2025-09-07T06:39:17.5896970Z * [new branch] gh/rtimpe/15/orig -> origin/gh/rtimpe/15/orig 2025-09-07T06:39:17.5897040Z * [new branch] gh/rtimpe/2/base -> origin/gh/rtimpe/2/base 2025-09-07T06:39:17.5897108Z * [new branch] gh/rtimpe/2/head -> origin/gh/rtimpe/2/head 2025-09-07T06:39:17.5897175Z * [new branch] gh/rtimpe/3/base -> origin/gh/rtimpe/3/base 2025-09-07T06:39:17.5897244Z * [new branch] gh/rtimpe/3/head -> origin/gh/rtimpe/3/head 2025-09-07T06:39:17.5897312Z * [new branch] gh/rtimpe/4/base -> origin/gh/rtimpe/4/base 2025-09-07T06:39:17.5897380Z * [new branch] gh/rtimpe/4/head -> origin/gh/rtimpe/4/head 2025-09-07T06:39:17.5897448Z * [new branch] gh/rtimpe/9/base -> origin/gh/rtimpe/9/base 2025-09-07T06:39:17.5897571Z * [new branch] gh/rtimpe/9/head -> origin/gh/rtimpe/9/head 2025-09-07T06:39:17.5897639Z * [new branch] gh/rtimpe/9/orig -> origin/gh/rtimpe/9/orig 2025-09-07T06:39:17.5897723Z * [new branch] gh/ruisizhang123/1/base -> origin/gh/ruisizhang123/1/base 2025-09-07T06:39:17.5897806Z * [new branch] gh/ruisizhang123/1/head -> origin/gh/ruisizhang123/1/head 2025-09-07T06:39:17.5897886Z * [new branch] gh/ruisizhang123/1/orig -> origin/gh/ruisizhang123/1/orig 2025-09-07T06:39:17.5897964Z * [new branch] gh/ruisizhang123/4/base -> origin/gh/ruisizhang123/4/base 2025-09-07T06:39:17.5898044Z * [new branch] gh/ruisizhang123/4/head -> origin/gh/ruisizhang123/4/head 2025-09-07T06:39:17.5899236Z * [new branch] gh/ruisizhang123/4/orig -> origin/gh/ruisizhang123/4/orig 2025-09-07T06:39:17.5899319Z * [new branch] gh/ruisizhang123/5/base -> origin/gh/ruisizhang123/5/base 2025-09-07T06:39:17.5899398Z * [new branch] gh/ruisizhang123/5/head -> origin/gh/ruisizhang123/5/head 2025-09-07T06:39:17.5899476Z * [new branch] gh/ruisizhang123/5/orig -> origin/gh/ruisizhang123/5/orig 2025-09-07T06:39:17.5899554Z * [new branch] gh/ruisizhang123/6/base -> origin/gh/ruisizhang123/6/base 2025-09-07T06:39:17.5899632Z * [new branch] gh/ruisizhang123/6/head -> origin/gh/ruisizhang123/6/head 2025-09-07T06:39:17.5899710Z * [new branch] gh/ruisizhang123/6/orig -> origin/gh/ruisizhang123/6/orig 2025-09-07T06:39:17.5899788Z * [new branch] gh/ruisizhang123/7/base -> origin/gh/ruisizhang123/7/base 2025-09-07T06:39:17.5899868Z * [new branch] gh/ruisizhang123/7/head -> origin/gh/ruisizhang123/7/head 2025-09-07T06:39:17.5899946Z * [new branch] gh/ruisizhang123/7/orig -> origin/gh/ruisizhang123/7/orig 2025-09-07T06:39:17.5900024Z * [new branch] gh/ruisizhang123/8/base -> origin/gh/ruisizhang123/8/base 2025-09-07T06:39:17.5900103Z * [new branch] gh/ruisizhang123/8/head -> origin/gh/ruisizhang123/8/head 2025-09-07T06:39:17.5900182Z * [new branch] gh/ruisizhang123/8/orig -> origin/gh/ruisizhang123/8/orig 2025-09-07T06:39:17.5900260Z * [new branch] gh/ruisizhang123/9/base -> origin/gh/ruisizhang123/9/base 2025-09-07T06:39:17.5900339Z * [new branch] gh/ruisizhang123/9/head -> origin/gh/ruisizhang123/9/head 2025-09-07T06:39:17.5900417Z * [new branch] gh/ruisizhang123/9/orig -> origin/gh/ruisizhang123/9/orig 2025-09-07T06:39:17.5900485Z * [new branch] gh/sarckk/2/base -> origin/gh/sarckk/2/base 2025-09-07T06:39:17.5900556Z * [new branch] gh/sarckk/2/head -> origin/gh/sarckk/2/head 2025-09-07T06:39:17.5900623Z * [new branch] gh/sarckk/2/orig -> origin/gh/sarckk/2/orig 2025-09-07T06:39:17.5900704Z * [new branch] gh/seemethere/35/base -> origin/gh/seemethere/35/base 2025-09-07T06:39:17.5900782Z * [new branch] gh/seemethere/35/head -> origin/gh/seemethere/35/head 2025-09-07T06:39:17.5900858Z * [new branch] gh/seemethere/35/orig -> origin/gh/seemethere/35/orig 2025-09-07T06:39:17.5900934Z * [new branch] gh/seemethere/37/base -> origin/gh/seemethere/37/base 2025-09-07T06:39:17.5901010Z * [new branch] gh/seemethere/37/head -> origin/gh/seemethere/37/head 2025-09-07T06:39:17.5902125Z * [new branch] gh/seemethere/37/orig -> origin/gh/seemethere/37/orig 2025-09-07T06:39:17.5902203Z * [new branch] gh/seemethere/43/base -> origin/gh/seemethere/43/base 2025-09-07T06:39:17.5902281Z * [new branch] gh/seemethere/43/head -> origin/gh/seemethere/43/head 2025-09-07T06:39:17.5902356Z * [new branch] gh/seemethere/43/orig -> origin/gh/seemethere/43/orig 2025-09-07T06:39:17.5902467Z * [new branch] gh/seemethere/44/base -> origin/gh/seemethere/44/base 2025-09-07T06:39:17.5902544Z * [new branch] gh/seemethere/44/head -> origin/gh/seemethere/44/head 2025-09-07T06:39:17.5902619Z * [new branch] gh/seemethere/44/orig -> origin/gh/seemethere/44/orig 2025-09-07T06:39:17.5902695Z * [new branch] gh/seemethere/48/base -> origin/gh/seemethere/48/base 2025-09-07T06:39:17.5902772Z * [new branch] gh/seemethere/48/head -> origin/gh/seemethere/48/head 2025-09-07T06:39:17.5902846Z * [new branch] gh/seemethere/48/orig -> origin/gh/seemethere/48/orig 2025-09-07T06:39:17.5902950Z * [new branch] gh/seemethere/49/base -> origin/gh/seemethere/49/base 2025-09-07T06:39:17.5903024Z * [new branch] gh/seemethere/49/head -> origin/gh/seemethere/49/head 2025-09-07T06:39:17.5903100Z * [new branch] gh/seemethere/49/orig -> origin/gh/seemethere/49/orig 2025-09-07T06:39:17.5903177Z * [new branch] gh/seemethere/52/base -> origin/gh/seemethere/52/base 2025-09-07T06:39:17.5903252Z * [new branch] gh/seemethere/52/head -> origin/gh/seemethere/52/head 2025-09-07T06:39:17.5903329Z * [new branch] gh/seemethere/52/orig -> origin/gh/seemethere/52/orig 2025-09-07T06:39:17.5903404Z * [new branch] gh/seemethere/53/base -> origin/gh/seemethere/53/base 2025-09-07T06:39:17.5903479Z * [new branch] gh/seemethere/53/head -> origin/gh/seemethere/53/head 2025-09-07T06:39:17.5903555Z * [new branch] gh/seemethere/53/orig -> origin/gh/seemethere/53/orig 2025-09-07T06:39:17.5903632Z * [new branch] gh/seemethere/54/base -> origin/gh/seemethere/54/base 2025-09-07T06:39:17.5903708Z * [new branch] gh/seemethere/54/head -> origin/gh/seemethere/54/head 2025-09-07T06:39:17.5903785Z * [new branch] gh/seemethere/54/orig -> origin/gh/seemethere/54/orig 2025-09-07T06:39:17.5903861Z * [new branch] gh/seemethere/55/base -> origin/gh/seemethere/55/base 2025-09-07T06:39:17.5903936Z * [new branch] gh/seemethere/55/head -> origin/gh/seemethere/55/head 2025-09-07T06:39:17.5905051Z * [new branch] gh/seemethere/55/orig -> origin/gh/seemethere/55/orig 2025-09-07T06:39:17.5905131Z * [new branch] gh/seemethere/56/base -> origin/gh/seemethere/56/base 2025-09-07T06:39:17.5905206Z * [new branch] gh/seemethere/56/head -> origin/gh/seemethere/56/head 2025-09-07T06:39:17.5905282Z * [new branch] gh/seemethere/56/orig -> origin/gh/seemethere/56/orig 2025-09-07T06:39:17.5905359Z * [new branch] gh/seemethere/57/base -> origin/gh/seemethere/57/base 2025-09-07T06:39:17.5905434Z * [new branch] gh/seemethere/57/head -> origin/gh/seemethere/57/head 2025-09-07T06:39:17.5905512Z * [new branch] gh/seemethere/57/orig -> origin/gh/seemethere/57/orig 2025-09-07T06:39:17.5905587Z * [new branch] gh/seemethere/58/base -> origin/gh/seemethere/58/base 2025-09-07T06:39:17.5905661Z * [new branch] gh/seemethere/58/head -> origin/gh/seemethere/58/head 2025-09-07T06:39:17.5905737Z * [new branch] gh/seemethere/58/orig -> origin/gh/seemethere/58/orig 2025-09-07T06:39:17.5905812Z * [new branch] gh/seemethere/59/base -> origin/gh/seemethere/59/base 2025-09-07T06:39:17.5905887Z * [new branch] gh/seemethere/59/head -> origin/gh/seemethere/59/head 2025-09-07T06:39:17.5905965Z * [new branch] gh/seemethere/59/orig -> origin/gh/seemethere/59/orig 2025-09-07T06:39:17.5906041Z * [new branch] gh/seemethere/60/base -> origin/gh/seemethere/60/base 2025-09-07T06:39:17.5906115Z * [new branch] gh/seemethere/60/head -> origin/gh/seemethere/60/head 2025-09-07T06:39:17.5906235Z * [new branch] gh/seemethere/60/orig -> origin/gh/seemethere/60/orig 2025-09-07T06:39:17.5906312Z * [new branch] gh/seemethere/61/base -> origin/gh/seemethere/61/base 2025-09-07T06:39:17.5906388Z * [new branch] gh/seemethere/61/head -> origin/gh/seemethere/61/head 2025-09-07T06:39:17.5906463Z * [new branch] gh/seemethere/61/orig -> origin/gh/seemethere/61/orig 2025-09-07T06:39:17.5906635Z * [new branch] gh/seemethere/62/base -> origin/gh/seemethere/62/base 2025-09-07T06:39:17.5906711Z * [new branch] gh/seemethere/62/head -> origin/gh/seemethere/62/head 2025-09-07T06:39:17.5906840Z * [new branch] gh/seemethere/62/orig -> origin/gh/seemethere/62/orig 2025-09-07T06:39:17.5906917Z * [new branch] gh/seemethere/63/base -> origin/gh/seemethere/63/base 2025-09-07T06:39:17.5906993Z * [new branch] gh/seemethere/63/head -> origin/gh/seemethere/63/head 2025-09-07T06:39:17.5907068Z * [new branch] gh/seemethere/63/orig -> origin/gh/seemethere/63/orig 2025-09-07T06:39:17.5908219Z * [new branch] gh/shunting314/145/base -> origin/gh/shunting314/145/base 2025-09-07T06:39:17.5908298Z * [new branch] gh/shunting314/145/head -> origin/gh/shunting314/145/head 2025-09-07T06:39:17.5908377Z * [new branch] gh/shunting314/145/orig -> origin/gh/shunting314/145/orig 2025-09-07T06:39:17.5908455Z * [new branch] gh/shunting314/176/base -> origin/gh/shunting314/176/base 2025-09-07T06:39:17.5908533Z * [new branch] gh/shunting314/176/head -> origin/gh/shunting314/176/head 2025-09-07T06:39:17.5908612Z * [new branch] gh/shunting314/176/orig -> origin/gh/shunting314/176/orig 2025-09-07T06:39:17.5908690Z * [new branch] gh/shunting314/211/base -> origin/gh/shunting314/211/base 2025-09-07T06:39:17.5908769Z * [new branch] gh/shunting314/211/head -> origin/gh/shunting314/211/head 2025-09-07T06:39:17.5908847Z * [new branch] gh/shunting314/211/orig -> origin/gh/shunting314/211/orig 2025-09-07T06:39:17.5908926Z * [new branch] gh/shunting314/212/base -> origin/gh/shunting314/212/base 2025-09-07T06:39:17.5909004Z * [new branch] gh/shunting314/212/head -> origin/gh/shunting314/212/head 2025-09-07T06:39:17.5909082Z * [new branch] gh/shunting314/212/orig -> origin/gh/shunting314/212/orig 2025-09-07T06:39:17.5909161Z * [new branch] gh/shunting314/213/base -> origin/gh/shunting314/213/base 2025-09-07T06:39:17.5909240Z * [new branch] gh/shunting314/213/head -> origin/gh/shunting314/213/head 2025-09-07T06:39:17.5909317Z * [new branch] gh/shunting314/213/orig -> origin/gh/shunting314/213/orig 2025-09-07T06:39:17.5909397Z * [new branch] gh/shunting314/214/base -> origin/gh/shunting314/214/base 2025-09-07T06:39:17.5909474Z * [new branch] gh/shunting314/214/head -> origin/gh/shunting314/214/head 2025-09-07T06:39:17.5909553Z * [new branch] gh/shunting314/214/orig -> origin/gh/shunting314/214/orig 2025-09-07T06:39:17.5909631Z * [new branch] gh/shunting314/215/base -> origin/gh/shunting314/215/base 2025-09-07T06:39:17.5909709Z * [new branch] gh/shunting314/215/head -> origin/gh/shunting314/215/head 2025-09-07T06:39:17.5909786Z * [new branch] gh/shunting314/215/orig -> origin/gh/shunting314/215/orig 2025-09-07T06:39:17.5909864Z * [new branch] gh/shunting314/216/base -> origin/gh/shunting314/216/base 2025-09-07T06:39:17.5909943Z * [new branch] gh/shunting314/216/head -> origin/gh/shunting314/216/head 2025-09-07T06:39:17.5910020Z * [new branch] gh/shunting314/216/orig -> origin/gh/shunting314/216/orig 2025-09-07T06:39:17.5911217Z * [new branch] gh/shunting314/217/base -> origin/gh/shunting314/217/base 2025-09-07T06:39:17.5911300Z * [new branch] gh/shunting314/217/head -> origin/gh/shunting314/217/head 2025-09-07T06:39:17.5911377Z * [new branch] gh/shunting314/217/orig -> origin/gh/shunting314/217/orig 2025-09-07T06:39:17.5911455Z * [new branch] gh/shunting314/218/base -> origin/gh/shunting314/218/base 2025-09-07T06:39:17.5911533Z * [new branch] gh/shunting314/218/head -> origin/gh/shunting314/218/head 2025-09-07T06:39:17.5911611Z * [new branch] gh/shunting314/218/orig -> origin/gh/shunting314/218/orig 2025-09-07T06:39:17.5911724Z * [new branch] gh/shunting314/219/base -> origin/gh/shunting314/219/base 2025-09-07T06:39:17.5911803Z * [new branch] gh/shunting314/219/head -> origin/gh/shunting314/219/head 2025-09-07T06:39:17.5911881Z * [new branch] gh/shunting314/219/orig -> origin/gh/shunting314/219/orig 2025-09-07T06:39:17.5911959Z * [new branch] gh/shunting314/220/base -> origin/gh/shunting314/220/base 2025-09-07T06:39:17.5912038Z * [new branch] gh/shunting314/220/head -> origin/gh/shunting314/220/head 2025-09-07T06:39:17.5912115Z * [new branch] gh/shunting314/220/orig -> origin/gh/shunting314/220/orig 2025-09-07T06:39:17.5912193Z * [new branch] gh/shunting314/221/base -> origin/gh/shunting314/221/base 2025-09-07T06:39:17.5912271Z * [new branch] gh/shunting314/221/head -> origin/gh/shunting314/221/head 2025-09-07T06:39:17.5912348Z * [new branch] gh/shunting314/221/orig -> origin/gh/shunting314/221/orig 2025-09-07T06:39:17.5912427Z * [new branch] gh/shunting314/222/base -> origin/gh/shunting314/222/base 2025-09-07T06:39:17.5912505Z * [new branch] gh/shunting314/222/head -> origin/gh/shunting314/222/head 2025-09-07T06:39:17.5912584Z * [new branch] gh/shunting314/222/orig -> origin/gh/shunting314/222/orig 2025-09-07T06:39:17.5912663Z * [new branch] gh/shunting314/223/base -> origin/gh/shunting314/223/base 2025-09-07T06:39:17.5912741Z * [new branch] gh/shunting314/223/head -> origin/gh/shunting314/223/head 2025-09-07T06:39:17.5912818Z * [new branch] gh/shunting314/223/orig -> origin/gh/shunting314/223/orig 2025-09-07T06:39:17.5912896Z * [new branch] gh/silverguo/1/base -> origin/gh/silverguo/1/base 2025-09-07T06:39:17.5912974Z * [new branch] gh/silverguo/1/head -> origin/gh/silverguo/1/head 2025-09-07T06:39:17.5913051Z * [new branch] gh/silverguo/2/base -> origin/gh/silverguo/2/base 2025-09-07T06:39:17.5914170Z * [new branch] gh/silverguo/2/head -> origin/gh/silverguo/2/head 2025-09-07T06:39:17.5914248Z * [new branch] gh/silverguo/3/base -> origin/gh/silverguo/3/base 2025-09-07T06:39:17.5914322Z * [new branch] gh/silverguo/3/head -> origin/gh/silverguo/3/head 2025-09-07T06:39:17.5914395Z * [new branch] gh/silverguo/4/base -> origin/gh/silverguo/4/base 2025-09-07T06:39:17.5914468Z * [new branch] gh/silverguo/4/head -> origin/gh/silverguo/4/head 2025-09-07T06:39:17.5914547Z * [new branch] gh/sinhaanhsul/1/base -> origin/gh/sinhaanhsul/1/base 2025-09-07T06:39:17.5914623Z * [new branch] gh/sinhaanhsul/1/head -> origin/gh/sinhaanhsul/1/head 2025-09-07T06:39:17.5914698Z * [new branch] gh/skarjala/17/base -> origin/gh/skarjala/17/base 2025-09-07T06:39:17.5914773Z * [new branch] gh/skarjala/17/head -> origin/gh/skarjala/17/head 2025-09-07T06:39:17.5914845Z * [new branch] gh/skarjala/17/orig -> origin/gh/skarjala/17/orig 2025-09-07T06:39:17.5914918Z * [new branch] gh/skarjala/18/base -> origin/gh/skarjala/18/base 2025-09-07T06:39:17.5915027Z * [new branch] gh/skarjala/18/head -> origin/gh/skarjala/18/head 2025-09-07T06:39:17.5915101Z * [new branch] gh/skarjala/18/orig -> origin/gh/skarjala/18/orig 2025-09-07T06:39:17.5915173Z * [new branch] gh/skarjala/19/base -> origin/gh/skarjala/19/base 2025-09-07T06:39:17.5915246Z * [new branch] gh/skarjala/19/head -> origin/gh/skarjala/19/head 2025-09-07T06:39:17.5915318Z * [new branch] gh/skarjala/19/orig -> origin/gh/skarjala/19/orig 2025-09-07T06:39:17.5915391Z * [new branch] gh/slayton58/1/base -> origin/gh/slayton58/1/base 2025-09-07T06:39:17.5915507Z * [new branch] gh/slayton58/1/head -> origin/gh/slayton58/1/head 2025-09-07T06:39:17.5915579Z * [new branch] gh/slayton58/1/orig -> origin/gh/slayton58/1/orig 2025-09-07T06:39:17.5915652Z * [new branch] gh/slayton58/2/base -> origin/gh/slayton58/2/base 2025-09-07T06:39:17.5915725Z * [new branch] gh/slayton58/2/head -> origin/gh/slayton58/2/head 2025-09-07T06:39:17.5915797Z * [new branch] gh/slayton58/2/orig -> origin/gh/slayton58/2/orig 2025-09-07T06:39:17.5915869Z * [new branch] gh/slayton58/3/base -> origin/gh/slayton58/3/base 2025-09-07T06:39:17.5915942Z * [new branch] gh/slayton58/3/head -> origin/gh/slayton58/3/head 2025-09-07T06:39:17.5917191Z * [new branch] gh/slayton58/3/orig -> origin/gh/slayton58/3/orig 2025-09-07T06:39:17.5917269Z * [new branch] gh/slayton58/4/base -> origin/gh/slayton58/4/base 2025-09-07T06:39:17.5917343Z * [new branch] gh/slayton58/4/head -> origin/gh/slayton58/4/head 2025-09-07T06:39:17.5917415Z * [new branch] gh/slayton58/4/orig -> origin/gh/slayton58/4/orig 2025-09-07T06:39:17.5917488Z * [new branch] gh/slayton58/5/base -> origin/gh/slayton58/5/base 2025-09-07T06:39:17.5917561Z * [new branch] gh/slayton58/5/head -> origin/gh/slayton58/5/head 2025-09-07T06:39:17.5917633Z * [new branch] gh/slayton58/5/orig -> origin/gh/slayton58/5/orig 2025-09-07T06:39:17.5917710Z * [new branch] gh/soulitzer/269/base -> origin/gh/soulitzer/269/base 2025-09-07T06:39:17.5917787Z * [new branch] gh/soulitzer/269/head -> origin/gh/soulitzer/269/head 2025-09-07T06:39:17.5917864Z * [new branch] gh/soulitzer/269/orig -> origin/gh/soulitzer/269/orig 2025-09-07T06:39:17.5917940Z * [new branch] gh/soulitzer/276/base -> origin/gh/soulitzer/276/base 2025-09-07T06:39:17.5918085Z * [new branch] gh/soulitzer/276/head -> origin/gh/soulitzer/276/head 2025-09-07T06:39:17.5918160Z * [new branch] gh/soulitzer/276/orig -> origin/gh/soulitzer/276/orig 2025-09-07T06:39:17.5918236Z * [new branch] gh/soulitzer/287/base -> origin/gh/soulitzer/287/base 2025-09-07T06:39:17.5918312Z * [new branch] gh/soulitzer/287/head -> origin/gh/soulitzer/287/head 2025-09-07T06:39:17.5918387Z * [new branch] gh/soulitzer/287/orig -> origin/gh/soulitzer/287/orig 2025-09-07T06:39:17.5918463Z * [new branch] gh/soulitzer/296/base -> origin/gh/soulitzer/296/base 2025-09-07T06:39:17.5918539Z * [new branch] gh/soulitzer/296/head -> origin/gh/soulitzer/296/head 2025-09-07T06:39:17.5918614Z * [new branch] gh/soulitzer/296/orig -> origin/gh/soulitzer/296/orig 2025-09-07T06:39:17.5918691Z * [new branch] gh/soulitzer/299/base -> origin/gh/soulitzer/299/base 2025-09-07T06:39:17.5918765Z * [new branch] gh/soulitzer/299/head -> origin/gh/soulitzer/299/head 2025-09-07T06:39:17.5918840Z * [new branch] gh/soulitzer/299/orig -> origin/gh/soulitzer/299/orig 2025-09-07T06:39:17.5918985Z * [new branch] gh/soulitzer/300/base -> origin/gh/soulitzer/300/base 2025-09-07T06:39:17.5919061Z * [new branch] gh/soulitzer/300/head -> origin/gh/soulitzer/300/head 2025-09-07T06:39:17.5920200Z * [new branch] gh/soulitzer/300/orig -> origin/gh/soulitzer/300/orig 2025-09-07T06:39:17.5920280Z * [new branch] gh/soulitzer/301/base -> origin/gh/soulitzer/301/base 2025-09-07T06:39:17.5920355Z * [new branch] gh/soulitzer/301/head -> origin/gh/soulitzer/301/head 2025-09-07T06:39:17.5920431Z * [new branch] gh/soulitzer/301/orig -> origin/gh/soulitzer/301/orig 2025-09-07T06:39:17.5920564Z * [new branch] gh/soulitzer/313/base -> origin/gh/soulitzer/313/base 2025-09-07T06:39:17.5920639Z * [new branch] gh/soulitzer/313/head -> origin/gh/soulitzer/313/head 2025-09-07T06:39:17.5920718Z * [new branch] gh/soulitzer/313/orig -> origin/gh/soulitzer/313/orig 2025-09-07T06:39:17.5920793Z * [new branch] gh/soulitzer/319/base -> origin/gh/soulitzer/319/base 2025-09-07T06:39:17.5920868Z * [new branch] gh/soulitzer/319/head -> origin/gh/soulitzer/319/head 2025-09-07T06:39:17.5920944Z * [new branch] gh/soulitzer/319/orig -> origin/gh/soulitzer/319/orig 2025-09-07T06:39:17.5921019Z * [new branch] gh/soulitzer/320/base -> origin/gh/soulitzer/320/base 2025-09-07T06:39:17.5921094Z * [new branch] gh/soulitzer/320/head -> origin/gh/soulitzer/320/head 2025-09-07T06:39:17.5921170Z * [new branch] gh/soulitzer/320/orig -> origin/gh/soulitzer/320/orig 2025-09-07T06:39:17.5921247Z * [new branch] gh/soulitzer/336/base -> origin/gh/soulitzer/336/base 2025-09-07T06:39:17.5921322Z * [new branch] gh/soulitzer/336/head -> origin/gh/soulitzer/336/head 2025-09-07T06:39:17.5921399Z * [new branch] gh/soulitzer/336/orig -> origin/gh/soulitzer/336/orig 2025-09-07T06:39:17.5921474Z * [new branch] gh/soulitzer/347/base -> origin/gh/soulitzer/347/base 2025-09-07T06:39:17.5921549Z * [new branch] gh/soulitzer/347/head -> origin/gh/soulitzer/347/head 2025-09-07T06:39:17.5921625Z * [new branch] gh/soulitzer/347/orig -> origin/gh/soulitzer/347/orig 2025-09-07T06:39:17.5921700Z * [new branch] gh/soulitzer/349/base -> origin/gh/soulitzer/349/base 2025-09-07T06:39:17.5921776Z * [new branch] gh/soulitzer/349/head -> origin/gh/soulitzer/349/head 2025-09-07T06:39:17.5921854Z * [new branch] gh/soulitzer/349/orig -> origin/gh/soulitzer/349/orig 2025-09-07T06:39:17.5921929Z * [new branch] gh/soulitzer/350/base -> origin/gh/soulitzer/350/base 2025-09-07T06:39:17.5922004Z * [new branch] gh/soulitzer/350/head -> origin/gh/soulitzer/350/head 2025-09-07T06:39:17.5923126Z * [new branch] gh/soulitzer/350/orig -> origin/gh/soulitzer/350/orig 2025-09-07T06:39:17.5923205Z * [new branch] gh/soulitzer/351/base -> origin/gh/soulitzer/351/base 2025-09-07T06:39:17.5923280Z * [new branch] gh/soulitzer/351/head -> origin/gh/soulitzer/351/head 2025-09-07T06:39:17.5923356Z * [new branch] gh/soulitzer/351/orig -> origin/gh/soulitzer/351/orig 2025-09-07T06:39:17.5923431Z * [new branch] gh/soulitzer/353/base -> origin/gh/soulitzer/353/base 2025-09-07T06:39:17.5923506Z * [new branch] gh/soulitzer/353/head -> origin/gh/soulitzer/353/head 2025-09-07T06:39:17.5923583Z * [new branch] gh/soulitzer/353/orig -> origin/gh/soulitzer/353/orig 2025-09-07T06:39:17.5923658Z * [new branch] gh/soulitzer/358/base -> origin/gh/soulitzer/358/base 2025-09-07T06:39:17.5923770Z * [new branch] gh/soulitzer/358/head -> origin/gh/soulitzer/358/head 2025-09-07T06:39:17.5923845Z * [new branch] gh/soulitzer/358/orig -> origin/gh/soulitzer/358/orig 2025-09-07T06:39:17.5923921Z * [new branch] gh/soulitzer/359/base -> origin/gh/soulitzer/359/base 2025-09-07T06:39:17.5923996Z * [new branch] gh/soulitzer/359/head -> origin/gh/soulitzer/359/head 2025-09-07T06:39:17.5924071Z * [new branch] gh/soulitzer/359/orig -> origin/gh/soulitzer/359/orig 2025-09-07T06:39:17.5924147Z * [new branch] gh/soulitzer/362/base -> origin/gh/soulitzer/362/base 2025-09-07T06:39:17.5924222Z * [new branch] gh/soulitzer/362/head -> origin/gh/soulitzer/362/head 2025-09-07T06:39:17.5924331Z * [new branch] gh/soulitzer/362/orig -> origin/gh/soulitzer/362/orig 2025-09-07T06:39:17.5924407Z * [new branch] gh/soulitzer/372/base -> origin/gh/soulitzer/372/base 2025-09-07T06:39:17.5924483Z * [new branch] gh/soulitzer/372/head -> origin/gh/soulitzer/372/head 2025-09-07T06:39:17.5924557Z * [new branch] gh/soulitzer/372/orig -> origin/gh/soulitzer/372/orig 2025-09-07T06:39:17.5924633Z * [new branch] gh/soulitzer/373/base -> origin/gh/soulitzer/373/base 2025-09-07T06:39:17.5924708Z * [new branch] gh/soulitzer/373/head -> origin/gh/soulitzer/373/head 2025-09-07T06:39:17.5924783Z * [new branch] gh/soulitzer/373/orig -> origin/gh/soulitzer/373/orig 2025-09-07T06:39:17.5924858Z * [new branch] gh/soulitzer/374/base -> origin/gh/soulitzer/374/base 2025-09-07T06:39:17.5924934Z * [new branch] gh/soulitzer/374/head -> origin/gh/soulitzer/374/head 2025-09-07T06:39:17.5925008Z * [new branch] gh/soulitzer/374/orig -> origin/gh/soulitzer/374/orig 2025-09-07T06:39:17.5926132Z * [new branch] gh/soulitzer/375/base -> origin/gh/soulitzer/375/base 2025-09-07T06:39:17.5926210Z * [new branch] gh/soulitzer/375/head -> origin/gh/soulitzer/375/head 2025-09-07T06:39:17.5926285Z * [new branch] gh/soulitzer/375/orig -> origin/gh/soulitzer/375/orig 2025-09-07T06:39:17.5926361Z * [new branch] gh/soulitzer/376/base -> origin/gh/soulitzer/376/base 2025-09-07T06:39:17.5926435Z * [new branch] gh/soulitzer/376/head -> origin/gh/soulitzer/376/head 2025-09-07T06:39:17.5926595Z * [new branch] gh/soulitzer/376/orig -> origin/gh/soulitzer/376/orig 2025-09-07T06:39:17.5926671Z * [new branch] gh/soulitzer/377/base -> origin/gh/soulitzer/377/base 2025-09-07T06:39:17.5926747Z * [new branch] gh/soulitzer/377/head -> origin/gh/soulitzer/377/head 2025-09-07T06:39:17.5926822Z * [new branch] gh/soulitzer/377/orig -> origin/gh/soulitzer/377/orig 2025-09-07T06:39:17.5926899Z * [new branch] gh/soulitzer/378/base -> origin/gh/soulitzer/378/base 2025-09-07T06:39:17.5926976Z * [new branch] gh/soulitzer/378/head -> origin/gh/soulitzer/378/head 2025-09-07T06:39:17.5927050Z * [new branch] gh/soulitzer/378/orig -> origin/gh/soulitzer/378/orig 2025-09-07T06:39:17.5927126Z * [new branch] gh/soulitzer/379/base -> origin/gh/soulitzer/379/base 2025-09-07T06:39:17.5927201Z * [new branch] gh/soulitzer/379/head -> origin/gh/soulitzer/379/head 2025-09-07T06:39:17.5927276Z * [new branch] gh/soulitzer/379/orig -> origin/gh/soulitzer/379/orig 2025-09-07T06:39:17.5927352Z * [new branch] gh/swolchok/728/next -> origin/gh/swolchok/728/next 2025-09-07T06:39:17.5927430Z * [new branch] gh/swolchok/767/base -> origin/gh/swolchok/767/base 2025-09-07T06:39:17.5927505Z * [new branch] gh/swolchok/767/head -> origin/gh/swolchok/767/head 2025-09-07T06:39:17.5927648Z * [new branch] gh/swolchok/767/orig -> origin/gh/swolchok/767/orig 2025-09-07T06:39:17.5927723Z * [new branch] gh/swolchok/768/base -> origin/gh/swolchok/768/base 2025-09-07T06:39:17.5927796Z * [new branch] gh/swolchok/768/head -> origin/gh/swolchok/768/head 2025-09-07T06:39:17.5927869Z * [new branch] gh/swolchok/768/orig -> origin/gh/swolchok/768/orig 2025-09-07T06:39:17.5927942Z * [new branch] gh/swolchok/769/base -> origin/gh/swolchok/769/base 2025-09-07T06:39:17.5928016Z * [new branch] gh/swolchok/769/head -> origin/gh/swolchok/769/head 2025-09-07T06:39:17.5929157Z * [new branch] gh/swolchok/769/orig -> origin/gh/swolchok/769/orig 2025-09-07T06:39:17.5929282Z * [new branch] gh/swolchok/771/base -> origin/gh/swolchok/771/base 2025-09-07T06:39:17.5929355Z * [new branch] gh/swolchok/771/head -> origin/gh/swolchok/771/head 2025-09-07T06:39:17.5929430Z * [new branch] gh/swolchok/771/orig -> origin/gh/swolchok/771/orig 2025-09-07T06:39:17.5929503Z * [new branch] gh/swolchok/772/base -> origin/gh/swolchok/772/base 2025-09-07T06:39:17.5929576Z * [new branch] gh/swolchok/772/head -> origin/gh/swolchok/772/head 2025-09-07T06:39:17.5929649Z * [new branch] gh/swolchok/772/orig -> origin/gh/swolchok/772/orig 2025-09-07T06:39:17.5929722Z * [new branch] gh/swolchok/773/base -> origin/gh/swolchok/773/base 2025-09-07T06:39:17.5929795Z * [new branch] gh/swolchok/773/head -> origin/gh/swolchok/773/head 2025-09-07T06:39:17.5929869Z * [new branch] gh/swolchok/773/orig -> origin/gh/swolchok/773/orig 2025-09-07T06:39:17.5929943Z * [new branch] gh/swolchok/786/base -> origin/gh/swolchok/786/base 2025-09-07T06:39:17.5930016Z * [new branch] gh/swolchok/786/head -> origin/gh/swolchok/786/head 2025-09-07T06:39:17.5930089Z * [new branch] gh/swolchok/786/orig -> origin/gh/swolchok/786/orig 2025-09-07T06:39:17.5930162Z * [new branch] gh/swolchok/787/base -> origin/gh/swolchok/787/base 2025-09-07T06:39:17.5930235Z * [new branch] gh/swolchok/787/head -> origin/gh/swolchok/787/head 2025-09-07T06:39:17.5930307Z * [new branch] gh/swolchok/787/orig -> origin/gh/swolchok/787/orig 2025-09-07T06:39:17.5930381Z * [new branch] gh/swolchok/788/base -> origin/gh/swolchok/788/base 2025-09-07T06:39:17.5930453Z * [new branch] gh/swolchok/788/head -> origin/gh/swolchok/788/head 2025-09-07T06:39:17.5930528Z * [new branch] gh/swolchok/788/orig -> origin/gh/swolchok/788/orig 2025-09-07T06:39:17.5930600Z * [new branch] gh/swolchok/789/base -> origin/gh/swolchok/789/base 2025-09-07T06:39:17.5930675Z * [new branch] gh/swolchok/789/head -> origin/gh/swolchok/789/head 2025-09-07T06:39:17.5930748Z * [new branch] gh/swolchok/789/orig -> origin/gh/swolchok/789/orig 2025-09-07T06:39:17.5930821Z * [new branch] gh/swolchok/790/base -> origin/gh/swolchok/790/base 2025-09-07T06:39:17.5930895Z * [new branch] gh/swolchok/790/head -> origin/gh/swolchok/790/head 2025-09-07T06:39:17.5932017Z * [new branch] gh/swolchok/790/orig -> origin/gh/swolchok/790/orig 2025-09-07T06:39:17.5932092Z * [new branch] gh/swolchok/791/base -> origin/gh/swolchok/791/base 2025-09-07T06:39:17.5932166Z * [new branch] gh/swolchok/791/head -> origin/gh/swolchok/791/head 2025-09-07T06:39:17.5932241Z * [new branch] gh/swolchok/791/orig -> origin/gh/swolchok/791/orig 2025-09-07T06:39:17.5932313Z * [new branch] gh/swolchok/792/base -> origin/gh/swolchok/792/base 2025-09-07T06:39:17.5932422Z * [new branch] gh/swolchok/792/head -> origin/gh/swolchok/792/head 2025-09-07T06:39:17.5932495Z * [new branch] gh/swolchok/792/orig -> origin/gh/swolchok/792/orig 2025-09-07T06:39:17.5932567Z * [new branch] gh/swolchok/793/base -> origin/gh/swolchok/793/base 2025-09-07T06:39:17.5932640Z * [new branch] gh/swolchok/793/head -> origin/gh/swolchok/793/head 2025-09-07T06:39:17.5932713Z * [new branch] gh/swolchok/793/orig -> origin/gh/swolchok/793/orig 2025-09-07T06:39:17.5932785Z * [new branch] gh/swolchok/794/base -> origin/gh/swolchok/794/base 2025-09-07T06:39:17.5932859Z * [new branch] gh/swolchok/794/head -> origin/gh/swolchok/794/head 2025-09-07T06:39:17.5932963Z * [new branch] gh/swolchok/794/orig -> origin/gh/swolchok/794/orig 2025-09-07T06:39:17.5933036Z * [new branch] gh/swolchok/795/base -> origin/gh/swolchok/795/base 2025-09-07T06:39:17.5933111Z * [new branch] gh/swolchok/795/head -> origin/gh/swolchok/795/head 2025-09-07T06:39:17.5933185Z * [new branch] gh/swolchok/795/orig -> origin/gh/swolchok/795/orig 2025-09-07T06:39:17.5933258Z * [new branch] gh/swolchok/796/base -> origin/gh/swolchok/796/base 2025-09-07T06:39:17.5933332Z * [new branch] gh/swolchok/796/head -> origin/gh/swolchok/796/head 2025-09-07T06:39:17.5933405Z * [new branch] gh/swolchok/796/orig -> origin/gh/swolchok/796/orig 2025-09-07T06:39:17.5933477Z * [new branch] gh/swolchok/797/base -> origin/gh/swolchok/797/base 2025-09-07T06:39:17.5933553Z * [new branch] gh/swolchok/797/head -> origin/gh/swolchok/797/head 2025-09-07T06:39:17.5933626Z * [new branch] gh/swolchok/797/orig -> origin/gh/swolchok/797/orig 2025-09-07T06:39:17.5933699Z * [new branch] gh/swolchok/798/base -> origin/gh/swolchok/798/base 2025-09-07T06:39:17.5933773Z * [new branch] gh/swolchok/798/head -> origin/gh/swolchok/798/head 2025-09-07T06:39:17.5934889Z * [new branch] gh/swolchok/798/orig -> origin/gh/swolchok/798/orig 2025-09-07T06:39:17.5934965Z * [new branch] gh/swolchok/799/base -> origin/gh/swolchok/799/base 2025-09-07T06:39:17.5935038Z * [new branch] gh/swolchok/799/head -> origin/gh/swolchok/799/head 2025-09-07T06:39:17.5935111Z * [new branch] gh/swolchok/799/orig -> origin/gh/swolchok/799/orig 2025-09-07T06:39:17.5935184Z * [new branch] gh/swolchok/800/base -> origin/gh/swolchok/800/base 2025-09-07T06:39:17.5935259Z * [new branch] gh/swolchok/800/head -> origin/gh/swolchok/800/head 2025-09-07T06:39:17.5935332Z * [new branch] gh/swolchok/800/orig -> origin/gh/swolchok/800/orig 2025-09-07T06:39:17.5935406Z * [new branch] gh/swolchok/801/base -> origin/gh/swolchok/801/base 2025-09-07T06:39:17.5935479Z * [new branch] gh/swolchok/801/head -> origin/gh/swolchok/801/head 2025-09-07T06:39:17.5935553Z * [new branch] gh/swolchok/801/orig -> origin/gh/swolchok/801/orig 2025-09-07T06:39:17.5935626Z * [new branch] gh/swolchok/802/base -> origin/gh/swolchok/802/base 2025-09-07T06:39:17.5935699Z * [new branch] gh/swolchok/802/head -> origin/gh/swolchok/802/head 2025-09-07T06:39:17.5935773Z * [new branch] gh/swolchok/802/orig -> origin/gh/swolchok/802/orig 2025-09-07T06:39:17.5935845Z * [new branch] gh/swolchok/803/base -> origin/gh/swolchok/803/base 2025-09-07T06:39:17.5935919Z * [new branch] gh/swolchok/803/head -> origin/gh/swolchok/803/head 2025-09-07T06:39:17.5935993Z * [new branch] gh/swolchok/803/orig -> origin/gh/swolchok/803/orig 2025-09-07T06:39:17.5936102Z * [new branch] gh/swolchok/804/base -> origin/gh/swolchok/804/base 2025-09-07T06:39:17.5936175Z * [new branch] gh/swolchok/804/head -> origin/gh/swolchok/804/head 2025-09-07T06:39:17.5936249Z * [new branch] gh/swolchok/804/orig -> origin/gh/swolchok/804/orig 2025-09-07T06:39:17.5936322Z * [new branch] gh/swolchok/805/base -> origin/gh/swolchok/805/base 2025-09-07T06:39:17.5936395Z * [new branch] gh/swolchok/805/head -> origin/gh/swolchok/805/head 2025-09-07T06:39:17.5936468Z * [new branch] gh/swolchok/805/orig -> origin/gh/swolchok/805/orig 2025-09-07T06:39:17.5936628Z * [new branch] gh/swolchok/806/base -> origin/gh/swolchok/806/base 2025-09-07T06:39:17.5936754Z * [new branch] gh/swolchok/806/head -> origin/gh/swolchok/806/head 2025-09-07T06:39:17.5937890Z * [new branch] gh/swolchok/806/orig -> origin/gh/swolchok/806/orig 2025-09-07T06:39:17.5937967Z * [new branch] gh/swolchok/807/base -> origin/gh/swolchok/807/base 2025-09-07T06:39:17.5938040Z * [new branch] gh/swolchok/807/head -> origin/gh/swolchok/807/head 2025-09-07T06:39:17.5938114Z * [new branch] gh/swolchok/807/orig -> origin/gh/swolchok/807/orig 2025-09-07T06:39:17.5938186Z * [new branch] gh/swolchok/808/base -> origin/gh/swolchok/808/base 2025-09-07T06:39:17.5938259Z * [new branch] gh/swolchok/808/head -> origin/gh/swolchok/808/head 2025-09-07T06:39:17.5938333Z * [new branch] gh/swolchok/808/orig -> origin/gh/swolchok/808/orig 2025-09-07T06:39:17.5938407Z * [new branch] gh/swolchok/809/base -> origin/gh/swolchok/809/base 2025-09-07T06:39:17.5938480Z * [new branch] gh/swolchok/809/head -> origin/gh/swolchok/809/head 2025-09-07T06:39:17.5938557Z * [new branch] gh/swolchok/809/orig -> origin/gh/swolchok/809/orig 2025-09-07T06:39:17.5938632Z * [new branch] gh/swolchok/810/base -> origin/gh/swolchok/810/base 2025-09-07T06:39:17.5938705Z * [new branch] gh/swolchok/810/head -> origin/gh/swolchok/810/head 2025-09-07T06:39:17.5938778Z * [new branch] gh/swolchok/810/orig -> origin/gh/swolchok/810/orig 2025-09-07T06:39:17.5938852Z * [new branch] gh/swolchok/811/base -> origin/gh/swolchok/811/base 2025-09-07T06:39:17.5938925Z * [new branch] gh/swolchok/811/head -> origin/gh/swolchok/811/head 2025-09-07T06:39:17.5938998Z * [new branch] gh/swolchok/811/orig -> origin/gh/swolchok/811/orig 2025-09-07T06:39:17.5939073Z * [new branch] gh/swolchok/812/base -> origin/gh/swolchok/812/base 2025-09-07T06:39:17.5939146Z * [new branch] gh/swolchok/812/head -> origin/gh/swolchok/812/head 2025-09-07T06:39:17.5939219Z * [new branch] gh/swolchok/812/orig -> origin/gh/swolchok/812/orig 2025-09-07T06:39:17.5939293Z * [new branch] gh/swolchok/813/base -> origin/gh/swolchok/813/base 2025-09-07T06:39:17.5939366Z * [new branch] gh/swolchok/813/head -> origin/gh/swolchok/813/head 2025-09-07T06:39:17.5939438Z * [new branch] gh/swolchok/813/orig -> origin/gh/swolchok/813/orig 2025-09-07T06:39:17.5939512Z * [new branch] gh/swolchok/814/base -> origin/gh/swolchok/814/base 2025-09-07T06:39:17.5939584Z * [new branch] gh/swolchok/814/head -> origin/gh/swolchok/814/head 2025-09-07T06:39:17.5939657Z * [new branch] gh/swolchok/814/orig -> origin/gh/swolchok/814/orig 2025-09-07T06:39:17.5940774Z * [new branch] gh/swolchok/815/base -> origin/gh/swolchok/815/base 2025-09-07T06:39:17.5940848Z * [new branch] gh/swolchok/815/head -> origin/gh/swolchok/815/head 2025-09-07T06:39:17.5940976Z * [new branch] gh/swolchok/815/orig -> origin/gh/swolchok/815/orig 2025-09-07T06:39:17.5941051Z * [new branch] gh/swolchok/816/base -> origin/gh/swolchok/816/base 2025-09-07T06:39:17.5941124Z * [new branch] gh/swolchok/816/head -> origin/gh/swolchok/816/head 2025-09-07T06:39:17.5941197Z * [new branch] gh/swolchok/816/orig -> origin/gh/swolchok/816/orig 2025-09-07T06:39:17.5941270Z * [new branch] gh/swolchok/817/base -> origin/gh/swolchok/817/base 2025-09-07T06:39:17.5941342Z * [new branch] gh/swolchok/817/head -> origin/gh/swolchok/817/head 2025-09-07T06:39:17.5941414Z * [new branch] gh/swolchok/817/orig -> origin/gh/swolchok/817/orig 2025-09-07T06:39:17.5941529Z * [new branch] gh/swolchok/818/base -> origin/gh/swolchok/818/base 2025-09-07T06:39:17.5941601Z * [new branch] gh/swolchok/818/head -> origin/gh/swolchok/818/head 2025-09-07T06:39:17.5941675Z * [new branch] gh/swolchok/818/orig -> origin/gh/swolchok/818/orig 2025-09-07T06:39:17.5941749Z * [new branch] gh/swolchok/819/base -> origin/gh/swolchok/819/base 2025-09-07T06:39:17.5941821Z * [new branch] gh/swolchok/819/head -> origin/gh/swolchok/819/head 2025-09-07T06:39:17.5941894Z * [new branch] gh/swolchok/819/orig -> origin/gh/swolchok/819/orig 2025-09-07T06:39:17.5941968Z * [new branch] gh/swolchok/820/base -> origin/gh/swolchok/820/base 2025-09-07T06:39:17.5942041Z * [new branch] gh/swolchok/820/head -> origin/gh/swolchok/820/head 2025-09-07T06:39:17.5942115Z * [new branch] gh/swolchok/820/orig -> origin/gh/swolchok/820/orig 2025-09-07T06:39:17.5942187Z * [new branch] gh/swolchok/821/base -> origin/gh/swolchok/821/base 2025-09-07T06:39:17.5942261Z * [new branch] gh/swolchok/821/head -> origin/gh/swolchok/821/head 2025-09-07T06:39:17.5942335Z * [new branch] gh/swolchok/821/orig -> origin/gh/swolchok/821/orig 2025-09-07T06:39:17.5942407Z * [new branch] gh/swolchok/822/base -> origin/gh/swolchok/822/base 2025-09-07T06:39:17.5942481Z * [new branch] gh/swolchok/822/head -> origin/gh/swolchok/822/head 2025-09-07T06:39:17.5942554Z * [new branch] gh/swolchok/822/orig -> origin/gh/swolchok/822/orig 2025-09-07T06:39:17.5943673Z * [new branch] gh/swolchok/823/base -> origin/gh/swolchok/823/base 2025-09-07T06:39:17.5943750Z * [new branch] gh/swolchok/823/head -> origin/gh/swolchok/823/head 2025-09-07T06:39:17.5943824Z * [new branch] gh/swolchok/823/orig -> origin/gh/swolchok/823/orig 2025-09-07T06:39:17.5943897Z * [new branch] gh/swolchok/824/base -> origin/gh/swolchok/824/base 2025-09-07T06:39:17.5943971Z * [new branch] gh/swolchok/824/head -> origin/gh/swolchok/824/head 2025-09-07T06:39:17.5944043Z * [new branch] gh/swolchok/824/orig -> origin/gh/swolchok/824/orig 2025-09-07T06:39:17.5944115Z * [new branch] gh/swolchok/825/base -> origin/gh/swolchok/825/base 2025-09-07T06:39:17.5944189Z * [new branch] gh/swolchok/825/head -> origin/gh/swolchok/825/head 2025-09-07T06:39:17.5944261Z * [new branch] gh/swolchok/825/orig -> origin/gh/swolchok/825/orig 2025-09-07T06:39:17.5944335Z * [new branch] gh/swolchok/826/base -> origin/gh/swolchok/826/base 2025-09-07T06:39:17.5944408Z * [new branch] gh/swolchok/826/head -> origin/gh/swolchok/826/head 2025-09-07T06:39:17.5944482Z * [new branch] gh/swolchok/826/orig -> origin/gh/swolchok/826/orig 2025-09-07T06:39:17.5944555Z * [new branch] gh/swolchok/827/base -> origin/gh/swolchok/827/base 2025-09-07T06:39:17.5944670Z * [new branch] gh/swolchok/827/head -> origin/gh/swolchok/827/head 2025-09-07T06:39:17.5944743Z * [new branch] gh/swolchok/827/orig -> origin/gh/swolchok/827/orig 2025-09-07T06:39:17.5944816Z * [new branch] gh/swolchok/828/base -> origin/gh/swolchok/828/base 2025-09-07T06:39:17.5944890Z * [new branch] gh/swolchok/828/head -> origin/gh/swolchok/828/head 2025-09-07T06:39:17.5944963Z * [new branch] gh/swolchok/828/orig -> origin/gh/swolchok/828/orig 2025-09-07T06:39:17.5945035Z * [new branch] gh/swolchok/829/base -> origin/gh/swolchok/829/base 2025-09-07T06:39:17.5945109Z * [new branch] gh/swolchok/829/head -> origin/gh/swolchok/829/head 2025-09-07T06:39:17.5945215Z * [new branch] gh/swolchok/829/orig -> origin/gh/swolchok/829/orig 2025-09-07T06:39:17.5945288Z * [new branch] gh/swolchok/830/base -> origin/gh/swolchok/830/base 2025-09-07T06:39:17.5945362Z * [new branch] gh/swolchok/830/head -> origin/gh/swolchok/830/head 2025-09-07T06:39:17.5945436Z * [new branch] gh/swolchok/830/orig -> origin/gh/swolchok/830/orig 2025-09-07T06:39:17.5946633Z * [new branch] gh/swolchok/831/base -> origin/gh/swolchok/831/base 2025-09-07T06:39:17.5946711Z * [new branch] gh/swolchok/831/head -> origin/gh/swolchok/831/head 2025-09-07T06:39:17.5946784Z * [new branch] gh/swolchok/831/orig -> origin/gh/swolchok/831/orig 2025-09-07T06:39:17.5946856Z * [new branch] gh/swolchok/832/base -> origin/gh/swolchok/832/base 2025-09-07T06:39:17.5946932Z * [new branch] gh/swolchok/832/head -> origin/gh/swolchok/832/head 2025-09-07T06:39:17.5947005Z * [new branch] gh/swolchok/832/orig -> origin/gh/swolchok/832/orig 2025-09-07T06:39:17.5947082Z * [new branch] gh/syed-ahmed/3/base -> origin/gh/syed-ahmed/3/base 2025-09-07T06:39:17.5947156Z * [new branch] gh/syed-ahmed/3/head -> origin/gh/syed-ahmed/3/head 2025-09-07T06:39:17.5947228Z * [new branch] gh/syed-ahmed/3/orig -> origin/gh/syed-ahmed/3/orig 2025-09-07T06:39:17.5947300Z * [new branch] gh/syed-ahmed/4/base -> origin/gh/syed-ahmed/4/base 2025-09-07T06:39:17.5947371Z * [new branch] gh/syed-ahmed/4/head -> origin/gh/syed-ahmed/4/head 2025-09-07T06:39:17.5947444Z * [new branch] gh/syed-ahmed/4/orig -> origin/gh/syed-ahmed/4/orig 2025-09-07T06:39:17.5947515Z * [new branch] gh/syed-ahmed/5/base -> origin/gh/syed-ahmed/5/base 2025-09-07T06:39:17.5947588Z * [new branch] gh/syed-ahmed/5/head -> origin/gh/syed-ahmed/5/head 2025-09-07T06:39:17.5947660Z * [new branch] gh/syed-ahmed/5/orig -> origin/gh/syed-ahmed/5/orig 2025-09-07T06:39:17.5947736Z * [new branch] gh/teja-rao/4/base -> origin/gh/teja-rao/4/base 2025-09-07T06:39:17.5947808Z * [new branch] gh/teja-rao/4/head -> origin/gh/teja-rao/4/head 2025-09-07T06:39:17.5947880Z * [new branch] gh/teja-rao/4/orig -> origin/gh/teja-rao/4/orig 2025-09-07T06:39:17.5947952Z * [new branch] gh/tianyu-l/2/base -> origin/gh/tianyu-l/2/base 2025-09-07T06:39:17.5948022Z * [new branch] gh/tianyu-l/2/head -> origin/gh/tianyu-l/2/head 2025-09-07T06:39:17.5948092Z * [new branch] gh/tianyu-l/2/orig -> origin/gh/tianyu-l/2/orig 2025-09-07T06:39:17.5948161Z * [new branch] gh/tianyu-l/3/base -> origin/gh/tianyu-l/3/base 2025-09-07T06:39:17.5948232Z * [new branch] gh/tianyu-l/3/head -> origin/gh/tianyu-l/3/head 2025-09-07T06:39:17.5948302Z * [new branch] gh/tianyu-l/3/orig -> origin/gh/tianyu-l/3/orig 2025-09-07T06:39:17.5949474Z * [new branch] gh/tianyu-l/4/base -> origin/gh/tianyu-l/4/base 2025-09-07T06:39:17.5949548Z * [new branch] gh/tianyu-l/4/head -> origin/gh/tianyu-l/4/head 2025-09-07T06:39:17.5949618Z * [new branch] gh/tianyu-l/4/orig -> origin/gh/tianyu-l/4/orig 2025-09-07T06:39:17.5949710Z * [new branch] gh/tugsbayasgalan/1/base -> origin/gh/tugsbayasgalan/1/base 2025-09-07T06:39:17.5949796Z * [new branch] gh/tugsbayasgalan/1/head -> origin/gh/tugsbayasgalan/1/head 2025-09-07T06:39:17.5949882Z * [new branch] gh/tugsbayasgalan/1/orig -> origin/gh/tugsbayasgalan/1/orig 2025-09-07T06:39:17.5949973Z * [new branch] gh/tugsbayasgalan/10/base -> origin/gh/tugsbayasgalan/10/base 2025-09-07T06:39:17.5950112Z * [new branch] gh/tugsbayasgalan/10/head -> origin/gh/tugsbayasgalan/10/head 2025-09-07T06:39:17.5950197Z * [new branch] gh/tugsbayasgalan/10/orig -> origin/gh/tugsbayasgalan/10/orig 2025-09-07T06:39:17.5950282Z * [new branch] gh/tugsbayasgalan/11/base -> origin/gh/tugsbayasgalan/11/base 2025-09-07T06:39:17.5950366Z * [new branch] gh/tugsbayasgalan/11/head -> origin/gh/tugsbayasgalan/11/head 2025-09-07T06:39:17.5950451Z * [new branch] gh/tugsbayasgalan/11/orig -> origin/gh/tugsbayasgalan/11/orig 2025-09-07T06:39:17.5950534Z * [new branch] gh/tugsbayasgalan/12/base -> origin/gh/tugsbayasgalan/12/base 2025-09-07T06:39:17.5950617Z * [new branch] gh/tugsbayasgalan/12/head -> origin/gh/tugsbayasgalan/12/head 2025-09-07T06:39:17.5950700Z * [new branch] gh/tugsbayasgalan/12/orig -> origin/gh/tugsbayasgalan/12/orig 2025-09-07T06:39:17.5950785Z * [new branch] gh/tugsbayasgalan/13/base -> origin/gh/tugsbayasgalan/13/base 2025-09-07T06:39:17.5950868Z * [new branch] gh/tugsbayasgalan/13/head -> origin/gh/tugsbayasgalan/13/head 2025-09-07T06:39:17.5950952Z * [new branch] gh/tugsbayasgalan/13/orig -> origin/gh/tugsbayasgalan/13/orig 2025-09-07T06:39:17.5951036Z * [new branch] gh/tugsbayasgalan/14/base -> origin/gh/tugsbayasgalan/14/base 2025-09-07T06:39:17.5951119Z * [new branch] gh/tugsbayasgalan/14/head -> origin/gh/tugsbayasgalan/14/head 2025-09-07T06:39:17.5951201Z * [new branch] gh/tugsbayasgalan/14/orig -> origin/gh/tugsbayasgalan/14/orig 2025-09-07T06:39:17.5951285Z * [new branch] gh/tugsbayasgalan/15/base -> origin/gh/tugsbayasgalan/15/base 2025-09-07T06:39:17.5951367Z * [new branch] gh/tugsbayasgalan/15/head -> origin/gh/tugsbayasgalan/15/head 2025-09-07T06:39:17.5951452Z * [new branch] gh/tugsbayasgalan/15/orig -> origin/gh/tugsbayasgalan/15/orig 2025-09-07T06:39:17.5952589Z * [new branch] gh/tugsbayasgalan/2/base -> origin/gh/tugsbayasgalan/2/base 2025-09-07T06:39:17.5952676Z * [new branch] gh/tugsbayasgalan/2/head -> origin/gh/tugsbayasgalan/2/head 2025-09-07T06:39:17.5952761Z * [new branch] gh/tugsbayasgalan/2/orig -> origin/gh/tugsbayasgalan/2/orig 2025-09-07T06:39:17.5952844Z * [new branch] gh/tugsbayasgalan/3/base -> origin/gh/tugsbayasgalan/3/base 2025-09-07T06:39:17.5952927Z * [new branch] gh/tugsbayasgalan/3/head -> origin/gh/tugsbayasgalan/3/head 2025-09-07T06:39:17.5953010Z * [new branch] gh/tugsbayasgalan/3/orig -> origin/gh/tugsbayasgalan/3/orig 2025-09-07T06:39:17.5953094Z * [new branch] gh/tugsbayasgalan/4/base -> origin/gh/tugsbayasgalan/4/base 2025-09-07T06:39:17.5953176Z * [new branch] gh/tugsbayasgalan/4/head -> origin/gh/tugsbayasgalan/4/head 2025-09-07T06:39:17.5953259Z * [new branch] gh/tugsbayasgalan/4/orig -> origin/gh/tugsbayasgalan/4/orig 2025-09-07T06:39:17.5953342Z * [new branch] gh/tugsbayasgalan/5/base -> origin/gh/tugsbayasgalan/5/base 2025-09-07T06:39:17.5953459Z * [new branch] gh/tugsbayasgalan/5/head -> origin/gh/tugsbayasgalan/5/head 2025-09-07T06:39:17.5953541Z * [new branch] gh/tugsbayasgalan/5/orig -> origin/gh/tugsbayasgalan/5/orig 2025-09-07T06:39:17.5953624Z * [new branch] gh/tugsbayasgalan/6/base -> origin/gh/tugsbayasgalan/6/base 2025-09-07T06:39:17.5953706Z * [new branch] gh/tugsbayasgalan/6/head -> origin/gh/tugsbayasgalan/6/head 2025-09-07T06:39:17.5953788Z * [new branch] gh/tugsbayasgalan/6/orig -> origin/gh/tugsbayasgalan/6/orig 2025-09-07T06:39:17.5953870Z * [new branch] gh/tugsbayasgalan/7/base -> origin/gh/tugsbayasgalan/7/base 2025-09-07T06:39:17.5953980Z * [new branch] gh/tugsbayasgalan/7/head -> origin/gh/tugsbayasgalan/7/head 2025-09-07T06:39:17.5954062Z * [new branch] gh/tugsbayasgalan/7/orig -> origin/gh/tugsbayasgalan/7/orig 2025-09-07T06:39:17.5954144Z * [new branch] gh/tugsbayasgalan/8/base -> origin/gh/tugsbayasgalan/8/base 2025-09-07T06:39:17.5954228Z * [new branch] gh/tugsbayasgalan/8/head -> origin/gh/tugsbayasgalan/8/head 2025-09-07T06:39:17.5954309Z * [new branch] gh/tugsbayasgalan/8/orig -> origin/gh/tugsbayasgalan/8/orig 2025-09-07T06:39:17.5954393Z * [new branch] gh/tugsbayasgalan/9/base -> origin/gh/tugsbayasgalan/9/base 2025-09-07T06:39:17.5954474Z * [new branch] gh/tugsbayasgalan/9/head -> origin/gh/tugsbayasgalan/9/head 2025-09-07T06:39:17.5954557Z * [new branch] gh/tugsbayasgalan/9/orig -> origin/gh/tugsbayasgalan/9/orig 2025-09-07T06:39:17.5955673Z * [new branch] gh/v0i0/1/base -> origin/gh/v0i0/1/base 2025-09-07T06:39:17.5955747Z * [new branch] gh/v0i0/1/head -> origin/gh/v0i0/1/head 2025-09-07T06:39:17.5955814Z * [new branch] gh/v0i0/1/orig -> origin/gh/v0i0/1/orig 2025-09-07T06:39:17.5955882Z * [new branch] gh/v0i0/4/base -> origin/gh/v0i0/4/base 2025-09-07T06:39:17.5955946Z * [new branch] gh/v0i0/4/head -> origin/gh/v0i0/4/head 2025-09-07T06:39:17.5956009Z * [new branch] gh/v0i0/4/orig -> origin/gh/v0i0/4/orig 2025-09-07T06:39:17.5956073Z * [new branch] gh/v0i0/6/base -> origin/gh/v0i0/6/base 2025-09-07T06:39:17.5956138Z * [new branch] gh/v0i0/6/head -> origin/gh/v0i0/6/head 2025-09-07T06:39:17.5956202Z * [new branch] gh/v0i0/6/orig -> origin/gh/v0i0/6/orig 2025-09-07T06:39:17.5956265Z * [new branch] gh/v0i0/7/base -> origin/gh/v0i0/7/base 2025-09-07T06:39:17.5956331Z * [new branch] gh/v0i0/7/head -> origin/gh/v0i0/7/head 2025-09-07T06:39:17.5956395Z * [new branch] gh/v0i0/7/orig -> origin/gh/v0i0/7/orig 2025-09-07T06:39:17.5956458Z * [new branch] gh/v0i0/8/base -> origin/gh/v0i0/8/base 2025-09-07T06:39:17.5956599Z * [new branch] gh/v0i0/8/head -> origin/gh/v0i0/8/head 2025-09-07T06:39:17.5956663Z * [new branch] gh/v0i0/8/orig -> origin/gh/v0i0/8/orig 2025-09-07T06:39:17.5956727Z * [new branch] gh/v0i0/9/base -> origin/gh/v0i0/9/base 2025-09-07T06:39:17.5956792Z * [new branch] gh/v0i0/9/head -> origin/gh/v0i0/9/head 2025-09-07T06:39:17.5956855Z * [new branch] gh/v0i0/9/orig -> origin/gh/v0i0/9/orig 2025-09-07T06:39:17.5956925Z * [new branch] gh/vkuzo/1/next -> origin/gh/vkuzo/1/next 2025-09-07T06:39:17.5956997Z * [new branch] gh/vkuzo/2/next -> origin/gh/vkuzo/2/next 2025-09-07T06:39:17.5957064Z * [new branch] gh/vkuzo/3/next -> origin/gh/vkuzo/3/next 2025-09-07T06:39:17.5957130Z * [new branch] gh/vkuzo/4/base -> origin/gh/vkuzo/4/base 2025-09-07T06:39:17.5957252Z * [new branch] gh/vkuzo/4/head -> origin/gh/vkuzo/4/head 2025-09-07T06:39:17.5957319Z * [new branch] gh/vkuzo/4/orig -> origin/gh/vkuzo/4/orig 2025-09-07T06:39:17.5957385Z * [new branch] gh/vkuzo/5/base -> origin/gh/vkuzo/5/base 2025-09-07T06:39:17.5958562Z * [new branch] gh/vkuzo/5/head -> origin/gh/vkuzo/5/head 2025-09-07T06:39:17.5958631Z * [new branch] gh/vkuzo/5/orig -> origin/gh/vkuzo/5/orig 2025-09-07T06:39:17.5958697Z * [new branch] gh/vkuzo/6/base -> origin/gh/vkuzo/6/base 2025-09-07T06:39:17.5958835Z * [new branch] gh/vkuzo/6/head -> origin/gh/vkuzo/6/head 2025-09-07T06:39:17.5958902Z * [new branch] gh/vkuzo/6/orig -> origin/gh/vkuzo/6/orig 2025-09-07T06:39:17.5958968Z * [new branch] gh/vkuzo/7/base -> origin/gh/vkuzo/7/base 2025-09-07T06:39:17.5959036Z * [new branch] gh/vkuzo/7/head -> origin/gh/vkuzo/7/head 2025-09-07T06:39:17.5959101Z * [new branch] gh/vkuzo/7/orig -> origin/gh/vkuzo/7/orig 2025-09-07T06:39:17.5959179Z * [new branch] gh/wconstab/419/base -> origin/gh/wconstab/419/base 2025-09-07T06:39:17.5959256Z * [new branch] gh/wconstab/419/head -> origin/gh/wconstab/419/head 2025-09-07T06:39:17.5959332Z * [new branch] gh/wconstab/419/orig -> origin/gh/wconstab/419/orig 2025-09-07T06:39:17.5959406Z * [new branch] gh/wconstab/424/base -> origin/gh/wconstab/424/base 2025-09-07T06:39:17.5959482Z * [new branch] gh/wconstab/424/head -> origin/gh/wconstab/424/head 2025-09-07T06:39:17.5959556Z * [new branch] gh/wconstab/424/orig -> origin/gh/wconstab/424/orig 2025-09-07T06:39:17.5959628Z * [new branch] gh/wconstab/435/base -> origin/gh/wconstab/435/base 2025-09-07T06:39:17.5959703Z * [new branch] gh/wconstab/435/head -> origin/gh/wconstab/435/head 2025-09-07T06:39:17.5959776Z * [new branch] gh/wconstab/435/orig -> origin/gh/wconstab/435/orig 2025-09-07T06:39:17.5959849Z * [new branch] gh/wconstab/438/base -> origin/gh/wconstab/438/base 2025-09-07T06:39:17.5959923Z * [new branch] gh/wconstab/438/head -> origin/gh/wconstab/438/head 2025-09-07T06:39:17.5959996Z * [new branch] gh/wconstab/438/orig -> origin/gh/wconstab/438/orig 2025-09-07T06:39:17.5960068Z * [new branch] gh/wconstab/440/base -> origin/gh/wconstab/440/base 2025-09-07T06:39:17.5960143Z * [new branch] gh/wconstab/440/head -> origin/gh/wconstab/440/head 2025-09-07T06:39:17.5960217Z * [new branch] gh/wconstab/440/orig -> origin/gh/wconstab/440/orig 2025-09-07T06:39:17.5960291Z * [new branch] gh/wconstab/441/base -> origin/gh/wconstab/441/base 2025-09-07T06:39:17.5961417Z * [new branch] gh/wconstab/441/head -> origin/gh/wconstab/441/head 2025-09-07T06:39:17.5961494Z * [new branch] gh/wconstab/441/orig -> origin/gh/wconstab/441/orig 2025-09-07T06:39:17.5961567Z * [new branch] gh/wconstab/442/base -> origin/gh/wconstab/442/base 2025-09-07T06:39:17.5961639Z * [new branch] gh/wconstab/442/head -> origin/gh/wconstab/442/head 2025-09-07T06:39:17.5961712Z * [new branch] gh/wconstab/442/orig -> origin/gh/wconstab/442/orig 2025-09-07T06:39:17.5961785Z * [new branch] gh/wconstab/443/base -> origin/gh/wconstab/443/base 2025-09-07T06:39:17.5961860Z * [new branch] gh/wconstab/443/head -> origin/gh/wconstab/443/head 2025-09-07T06:39:17.5961934Z * [new branch] gh/wconstab/443/orig -> origin/gh/wconstab/443/orig 2025-09-07T06:39:17.5962044Z * [new branch] gh/wconstab/444/base -> origin/gh/wconstab/444/base 2025-09-07T06:39:17.5962118Z * [new branch] gh/wconstab/444/head -> origin/gh/wconstab/444/head 2025-09-07T06:39:17.5962192Z * [new branch] gh/wconstab/444/orig -> origin/gh/wconstab/444/orig 2025-09-07T06:39:17.5962265Z * [new branch] gh/wconstab/445/base -> origin/gh/wconstab/445/base 2025-09-07T06:39:17.5962338Z * [new branch] gh/wconstab/445/head -> origin/gh/wconstab/445/head 2025-09-07T06:39:17.5962412Z * [new branch] gh/wconstab/445/orig -> origin/gh/wconstab/445/orig 2025-09-07T06:39:17.5962485Z * [new branch] gh/wconstab/446/base -> origin/gh/wconstab/446/base 2025-09-07T06:39:17.5962587Z * [new branch] gh/wconstab/446/head -> origin/gh/wconstab/446/head 2025-09-07T06:39:17.5962660Z * [new branch] gh/wconstab/446/orig -> origin/gh/wconstab/446/orig 2025-09-07T06:39:17.5962735Z * [new branch] gh/wconstab/447/base -> origin/gh/wconstab/447/base 2025-09-07T06:39:17.5962808Z * [new branch] gh/wconstab/447/head -> origin/gh/wconstab/447/head 2025-09-07T06:39:17.5962881Z * [new branch] gh/wconstab/447/orig -> origin/gh/wconstab/447/orig 2025-09-07T06:39:17.5962958Z * [new branch] gh/weifengpy/27/base -> origin/gh/weifengpy/27/base 2025-09-07T06:39:17.5963032Z * [new branch] gh/weifengpy/27/head -> origin/gh/weifengpy/27/head 2025-09-07T06:39:17.5963105Z * [new branch] gh/weifengpy/27/orig -> origin/gh/weifengpy/27/orig 2025-09-07T06:39:17.5963185Z * [new branch] gh/weifengpy/30/base -> origin/gh/weifengpy/30/base 2025-09-07T06:39:17.5964356Z * [new branch] gh/weifengpy/30/head -> origin/gh/weifengpy/30/head 2025-09-07T06:39:17.5964433Z * [new branch] gh/weifengpy/30/orig -> origin/gh/weifengpy/30/orig 2025-09-07T06:39:17.5964521Z * [new branch] gh/williamwen42/196/base -> origin/gh/williamwen42/196/base 2025-09-07T06:39:17.5964604Z * [new branch] gh/williamwen42/196/head -> origin/gh/williamwen42/196/head 2025-09-07T06:39:17.5964684Z * [new branch] gh/williamwen42/196/orig -> origin/gh/williamwen42/196/orig 2025-09-07T06:39:17.5964766Z * [new branch] gh/williamwen42/250/base -> origin/gh/williamwen42/250/base 2025-09-07T06:39:17.5964846Z * [new branch] gh/williamwen42/250/head -> origin/gh/williamwen42/250/head 2025-09-07T06:39:17.5964927Z * [new branch] gh/williamwen42/250/orig -> origin/gh/williamwen42/250/orig 2025-09-07T06:39:17.5965008Z * [new branch] gh/williamwen42/258/base -> origin/gh/williamwen42/258/base 2025-09-07T06:39:17.5965089Z * [new branch] gh/williamwen42/258/head -> origin/gh/williamwen42/258/head 2025-09-07T06:39:17.5965170Z * [new branch] gh/williamwen42/258/orig -> origin/gh/williamwen42/258/orig 2025-09-07T06:39:17.5965252Z * [new branch] gh/williamwen42/266/base -> origin/gh/williamwen42/266/base 2025-09-07T06:39:17.5965332Z * [new branch] gh/williamwen42/266/head -> origin/gh/williamwen42/266/head 2025-09-07T06:39:17.5965411Z * [new branch] gh/williamwen42/266/orig -> origin/gh/williamwen42/266/orig 2025-09-07T06:39:17.5965492Z * [new branch] gh/williamwen42/267/base -> origin/gh/williamwen42/267/base 2025-09-07T06:39:17.5965572Z * [new branch] gh/williamwen42/267/head -> origin/gh/williamwen42/267/head 2025-09-07T06:39:17.5965653Z * [new branch] gh/williamwen42/267/orig -> origin/gh/williamwen42/267/orig 2025-09-07T06:39:17.5965735Z * [new branch] gh/williamwen42/270/base -> origin/gh/williamwen42/270/base 2025-09-07T06:39:17.5965815Z * [new branch] gh/williamwen42/270/head -> origin/gh/williamwen42/270/head 2025-09-07T06:39:17.5965927Z * [new branch] gh/williamwen42/270/orig -> origin/gh/williamwen42/270/orig 2025-09-07T06:39:17.5966007Z * [new branch] gh/williamwen42/271/base -> origin/gh/williamwen42/271/base 2025-09-07T06:39:17.5966088Z * [new branch] gh/williamwen42/271/head -> origin/gh/williamwen42/271/head 2025-09-07T06:39:17.5966168Z * [new branch] gh/williamwen42/271/orig -> origin/gh/williamwen42/271/orig 2025-09-07T06:39:17.5966248Z * [new branch] gh/williamwen42/272/base -> origin/gh/williamwen42/272/base 2025-09-07T06:39:17.5967464Z * [new branch] gh/williamwen42/272/head -> origin/gh/williamwen42/272/head 2025-09-07T06:39:17.5967612Z * [new branch] gh/williamwen42/272/orig -> origin/gh/williamwen42/272/orig 2025-09-07T06:39:17.5967693Z * [new branch] gh/williamwen42/274/base -> origin/gh/williamwen42/274/base 2025-09-07T06:39:17.5967774Z * [new branch] gh/williamwen42/274/head -> origin/gh/williamwen42/274/head 2025-09-07T06:39:17.5967854Z * [new branch] gh/williamwen42/274/orig -> origin/gh/williamwen42/274/orig 2025-09-07T06:39:17.5967934Z * [new branch] gh/williamwen42/275/base -> origin/gh/williamwen42/275/base 2025-09-07T06:39:17.5968016Z * [new branch] gh/williamwen42/275/head -> origin/gh/williamwen42/275/head 2025-09-07T06:39:17.5968096Z * [new branch] gh/williamwen42/276/base -> origin/gh/williamwen42/276/base 2025-09-07T06:39:17.5968176Z * [new branch] gh/williamwen42/276/head -> origin/gh/williamwen42/276/head 2025-09-07T06:39:17.5968258Z * [new branch] gh/williamwen42/276/orig -> origin/gh/williamwen42/276/orig 2025-09-07T06:39:17.5968337Z * [new branch] gh/williamwen42/277/base -> origin/gh/williamwen42/277/base 2025-09-07T06:39:17.5968417Z * [new branch] gh/williamwen42/277/head -> origin/gh/williamwen42/277/head 2025-09-07T06:39:17.5968499Z * [new branch] gh/williamwen42/277/orig -> origin/gh/williamwen42/277/orig 2025-09-07T06:39:17.5968579Z * [new branch] gh/williamwen42/278/base -> origin/gh/williamwen42/278/base 2025-09-07T06:39:17.5968658Z * [new branch] gh/williamwen42/278/head -> origin/gh/williamwen42/278/head 2025-09-07T06:39:17.5968739Z * [new branch] gh/williamwen42/278/orig -> origin/gh/williamwen42/278/orig 2025-09-07T06:39:17.5968818Z * [new branch] gh/williamwen42/279/base -> origin/gh/williamwen42/279/base 2025-09-07T06:39:17.5968898Z * [new branch] gh/williamwen42/279/head -> origin/gh/williamwen42/279/head 2025-09-07T06:39:17.5968980Z * [new branch] gh/williamwen42/279/orig -> origin/gh/williamwen42/279/orig 2025-09-07T06:39:17.5969060Z * [new branch] gh/williamwen42/280/base -> origin/gh/williamwen42/280/base 2025-09-07T06:39:17.5969142Z * [new branch] gh/williamwen42/280/head -> origin/gh/williamwen42/280/head 2025-09-07T06:39:17.5969222Z * [new branch] gh/williamwen42/280/orig -> origin/gh/williamwen42/280/orig 2025-09-07T06:39:17.5969302Z * [new branch] gh/williamwen42/281/base -> origin/gh/williamwen42/281/base 2025-09-07T06:39:17.5969382Z * [new branch] gh/williamwen42/281/head -> origin/gh/williamwen42/281/head 2025-09-07T06:39:17.5970513Z * [new branch] gh/williamwen42/281/orig -> origin/gh/williamwen42/281/orig 2025-09-07T06:39:17.5970596Z * [new branch] gh/williamwen42/282/base -> origin/gh/williamwen42/282/base 2025-09-07T06:39:17.5970678Z * [new branch] gh/williamwen42/282/head -> origin/gh/williamwen42/282/head 2025-09-07T06:39:17.5970759Z * [new branch] gh/williamwen42/282/orig -> origin/gh/williamwen42/282/orig 2025-09-07T06:39:17.5970839Z * [new branch] gh/williamwen42/283/base -> origin/gh/williamwen42/283/base 2025-09-07T06:39:17.5970972Z * [new branch] gh/williamwen42/283/head -> origin/gh/williamwen42/283/head 2025-09-07T06:39:17.5971054Z * [new branch] gh/williamwen42/283/orig -> origin/gh/williamwen42/283/orig 2025-09-07T06:39:17.5971133Z * [new branch] gh/williamwen42/284/base -> origin/gh/williamwen42/284/base 2025-09-07T06:39:17.5971213Z * [new branch] gh/williamwen42/284/head -> origin/gh/williamwen42/284/head 2025-09-07T06:39:17.5971293Z * [new branch] gh/williamwen42/284/orig -> origin/gh/williamwen42/284/orig 2025-09-07T06:39:17.5971373Z * [new branch] gh/williamwen42/285/base -> origin/gh/williamwen42/285/base 2025-09-07T06:39:17.5971480Z * [new branch] gh/williamwen42/285/head -> origin/gh/williamwen42/285/head 2025-09-07T06:39:17.5971562Z * [new branch] gh/williamwen42/285/orig -> origin/gh/williamwen42/285/orig 2025-09-07T06:39:17.5971643Z * [new branch] gh/williamwen42/286/base -> origin/gh/williamwen42/286/base 2025-09-07T06:39:17.5971723Z * [new branch] gh/williamwen42/286/head -> origin/gh/williamwen42/286/head 2025-09-07T06:39:17.5971803Z * [new branch] gh/williamwen42/286/orig -> origin/gh/williamwen42/286/orig 2025-09-07T06:39:17.5971884Z * [new branch] gh/williamwen42/287/base -> origin/gh/williamwen42/287/base 2025-09-07T06:39:17.5971963Z * [new branch] gh/williamwen42/287/head -> origin/gh/williamwen42/287/head 2025-09-07T06:39:17.5972043Z * [new branch] gh/williamwen42/287/orig -> origin/gh/williamwen42/287/orig 2025-09-07T06:39:17.5972126Z * [new branch] gh/williamwen42/288/base -> origin/gh/williamwen42/288/base 2025-09-07T06:39:17.5972206Z * [new branch] gh/williamwen42/288/head -> origin/gh/williamwen42/288/head 2025-09-07T06:39:17.5972287Z * [new branch] gh/williamwen42/288/orig -> origin/gh/williamwen42/288/orig 2025-09-07T06:39:17.5972368Z * [new branch] gh/williamwen42/289/base -> origin/gh/williamwen42/289/base 2025-09-07T06:39:17.5972448Z * [new branch] gh/williamwen42/289/head -> origin/gh/williamwen42/289/head 2025-09-07T06:39:17.5972527Z * [new branch] gh/williamwen42/289/orig -> origin/gh/williamwen42/289/orig 2025-09-07T06:39:17.5973662Z * [new branch] gh/wychi/1/base -> origin/gh/wychi/1/base 2025-09-07T06:39:17.5973731Z * [new branch] gh/wychi/1/head -> origin/gh/wychi/1/head 2025-09-07T06:39:17.5973798Z * [new branch] gh/wychi/1/orig -> origin/gh/wychi/1/orig 2025-09-07T06:39:17.5973873Z * [new branch] gh/xmfan/169/base -> origin/gh/xmfan/169/base 2025-09-07T06:39:17.5973943Z * [new branch] gh/xmfan/169/head -> origin/gh/xmfan/169/head 2025-09-07T06:39:17.5974014Z * [new branch] gh/xmfan/170/base -> origin/gh/xmfan/170/base 2025-09-07T06:39:17.5974082Z * [new branch] gh/xmfan/170/head -> origin/gh/xmfan/170/head 2025-09-07T06:39:17.5974154Z * [new branch] gh/xmfan/18/base -> origin/gh/xmfan/18/base 2025-09-07T06:39:17.5974223Z * [new branch] gh/xmfan/18/head -> origin/gh/xmfan/18/head 2025-09-07T06:39:17.5974294Z * [new branch] gh/xmfan/229/base -> origin/gh/xmfan/229/base 2025-09-07T06:39:17.5974362Z * [new branch] gh/xmfan/229/head -> origin/gh/xmfan/229/head 2025-09-07T06:39:17.5974430Z * [new branch] gh/xmfan/229/orig -> origin/gh/xmfan/229/orig 2025-09-07T06:39:17.5974500Z * [new branch] gh/xmfan/237/base -> origin/gh/xmfan/237/base 2025-09-07T06:39:17.5974568Z * [new branch] gh/xmfan/237/head -> origin/gh/xmfan/237/head 2025-09-07T06:39:17.5974685Z * [new branch] gh/xmfan/237/orig -> origin/gh/xmfan/237/orig 2025-09-07T06:39:17.5974754Z * [new branch] gh/xmfan/244/base -> origin/gh/xmfan/244/base 2025-09-07T06:39:17.5974822Z * [new branch] gh/xmfan/244/head -> origin/gh/xmfan/244/head 2025-09-07T06:39:17.5974891Z * [new branch] gh/xmfan/244/orig -> origin/gh/xmfan/244/orig 2025-09-07T06:39:17.5974958Z * [new branch] gh/xmfan/246/base -> origin/gh/xmfan/246/base 2025-09-07T06:39:17.5975031Z * [new branch] gh/xmfan/246/head -> origin/gh/xmfan/246/head 2025-09-07T06:39:17.5975102Z * [new branch] gh/xmfan/246/orig -> origin/gh/xmfan/246/orig 2025-09-07T06:39:17.5975244Z * [new branch] gh/xmfan/253/base -> origin/gh/xmfan/253/base 2025-09-07T06:39:17.5975314Z * [new branch] gh/xmfan/253/head -> origin/gh/xmfan/253/head 2025-09-07T06:39:17.5975383Z * [new branch] gh/xmfan/253/orig -> origin/gh/xmfan/253/orig 2025-09-07T06:39:17.5976588Z * [new branch] gh/xmfan/254/base -> origin/gh/xmfan/254/base 2025-09-07T06:39:17.5976661Z * [new branch] gh/xmfan/254/head -> origin/gh/xmfan/254/head 2025-09-07T06:39:17.5976728Z * [new branch] gh/xmfan/254/orig -> origin/gh/xmfan/254/orig 2025-09-07T06:39:17.5976796Z * [new branch] gh/xmfan/260/base -> origin/gh/xmfan/260/base 2025-09-07T06:39:17.5976865Z * [new branch] gh/xmfan/260/head -> origin/gh/xmfan/260/head 2025-09-07T06:39:17.5976934Z * [new branch] gh/xmfan/260/orig -> origin/gh/xmfan/260/orig 2025-09-07T06:39:17.5977005Z * [new branch] gh/xmfan/262/base -> origin/gh/xmfan/262/base 2025-09-07T06:39:17.5977074Z * [new branch] gh/xmfan/262/head -> origin/gh/xmfan/262/head 2025-09-07T06:39:17.5977143Z * [new branch] gh/xmfan/262/orig -> origin/gh/xmfan/262/orig 2025-09-07T06:39:17.5977211Z * [new branch] gh/xmfan/263/base -> origin/gh/xmfan/263/base 2025-09-07T06:39:17.5977280Z * [new branch] gh/xmfan/263/head -> origin/gh/xmfan/263/head 2025-09-07T06:39:17.5977349Z * [new branch] gh/xmfan/263/orig -> origin/gh/xmfan/263/orig 2025-09-07T06:39:17.5977416Z * [new branch] gh/xmfan/264/base -> origin/gh/xmfan/264/base 2025-09-07T06:39:17.5977485Z * [new branch] gh/xmfan/264/head -> origin/gh/xmfan/264/head 2025-09-07T06:39:17.5977552Z * [new branch] gh/xmfan/264/orig -> origin/gh/xmfan/264/orig 2025-09-07T06:39:17.5977622Z * [new branch] gh/xmfan/274/base -> origin/gh/xmfan/274/base 2025-09-07T06:39:17.5977692Z * [new branch] gh/xmfan/274/head -> origin/gh/xmfan/274/head 2025-09-07T06:39:17.5977761Z * [new branch] gh/xmfan/274/orig -> origin/gh/xmfan/274/orig 2025-09-07T06:39:17.5977829Z * [new branch] gh/xmfan/276/base -> origin/gh/xmfan/276/base 2025-09-07T06:39:17.5977898Z * [new branch] gh/xmfan/276/head -> origin/gh/xmfan/276/head 2025-09-07T06:39:17.5977967Z * [new branch] gh/xmfan/276/orig -> origin/gh/xmfan/276/orig 2025-09-07T06:39:17.5978035Z * [new branch] gh/xmfan/277/base -> origin/gh/xmfan/277/base 2025-09-07T06:39:17.5978103Z * [new branch] gh/xmfan/277/head -> origin/gh/xmfan/277/head 2025-09-07T06:39:17.5978172Z * [new branch] gh/xmfan/277/orig -> origin/gh/xmfan/277/orig 2025-09-07T06:39:17.5979287Z * [new branch] gh/xmfan/278/base -> origin/gh/xmfan/278/base 2025-09-07T06:39:17.5979359Z * [new branch] gh/xmfan/278/head -> origin/gh/xmfan/278/head 2025-09-07T06:39:17.5979485Z * [new branch] gh/xmfan/278/orig -> origin/gh/xmfan/278/orig 2025-09-07T06:39:17.5979555Z * [new branch] gh/xmfan/279/base -> origin/gh/xmfan/279/base 2025-09-07T06:39:17.5979622Z * [new branch] gh/xmfan/279/head -> origin/gh/xmfan/279/head 2025-09-07T06:39:17.5979691Z * [new branch] gh/xmfan/279/orig -> origin/gh/xmfan/279/orig 2025-09-07T06:39:17.5979759Z * [new branch] gh/xmfan/280/base -> origin/gh/xmfan/280/base 2025-09-07T06:39:17.5979827Z * [new branch] gh/xmfan/280/head -> origin/gh/xmfan/280/head 2025-09-07T06:39:17.5979896Z * [new branch] gh/xmfan/280/orig -> origin/gh/xmfan/280/orig 2025-09-07T06:39:17.5980020Z * [new branch] gh/xmfan/281/base -> origin/gh/xmfan/281/base 2025-09-07T06:39:17.5980087Z * [new branch] gh/xmfan/281/head -> origin/gh/xmfan/281/head 2025-09-07T06:39:17.5980158Z * [new branch] gh/xmfan/281/orig -> origin/gh/xmfan/281/orig 2025-09-07T06:39:17.5980226Z * [new branch] gh/xmfan/282/base -> origin/gh/xmfan/282/base 2025-09-07T06:39:17.5980294Z * [new branch] gh/xmfan/282/head -> origin/gh/xmfan/282/head 2025-09-07T06:39:17.5980363Z * [new branch] gh/xmfan/283/base -> origin/gh/xmfan/283/base 2025-09-07T06:39:17.5980431Z * [new branch] gh/xmfan/283/head -> origin/gh/xmfan/283/head 2025-09-07T06:39:17.5980499Z * [new branch] gh/xmfan/283/orig -> origin/gh/xmfan/283/orig 2025-09-07T06:39:17.5980582Z * [new branch] gh/xuanzhang816/14/base -> origin/gh/xuanzhang816/14/base 2025-09-07T06:39:17.5980665Z * [new branch] gh/xuanzhang816/14/head -> origin/gh/xuanzhang816/14/head 2025-09-07T06:39:17.5980743Z * [new branch] gh/xuanzhang816/14/orig -> origin/gh/xuanzhang816/14/orig 2025-09-07T06:39:17.5980824Z * [new branch] gh/xuanzhang816/19/base -> origin/gh/xuanzhang816/19/base 2025-09-07T06:39:17.5980902Z * [new branch] gh/xuanzhang816/19/head -> origin/gh/xuanzhang816/19/head 2025-09-07T06:39:17.5980980Z * [new branch] gh/xuanzhang816/19/orig -> origin/gh/xuanzhang816/19/orig 2025-09-07T06:39:17.5981058Z * [new branch] gh/xuanzhang816/22/base -> origin/gh/xuanzhang816/22/base 2025-09-07T06:39:17.5982183Z * [new branch] gh/xuanzhang816/22/head -> origin/gh/xuanzhang816/22/head 2025-09-07T06:39:17.5982264Z * [new branch] gh/xuanzhang816/22/orig -> origin/gh/xuanzhang816/22/orig 2025-09-07T06:39:17.5982343Z * [new branch] gh/xuanzhang816/23/base -> origin/gh/xuanzhang816/23/base 2025-09-07T06:39:17.5982420Z * [new branch] gh/xuanzhang816/23/head -> origin/gh/xuanzhang816/23/head 2025-09-07T06:39:17.5982497Z * [new branch] gh/xuanzhang816/23/orig -> origin/gh/xuanzhang816/23/orig 2025-09-07T06:39:17.5982577Z * [new branch] gh/xuanzhang816/24/base -> origin/gh/xuanzhang816/24/base 2025-09-07T06:39:17.5982654Z * [new branch] gh/xuanzhang816/24/head -> origin/gh/xuanzhang816/24/head 2025-09-07T06:39:17.5982732Z * [new branch] gh/xuanzhang816/24/orig -> origin/gh/xuanzhang816/24/orig 2025-09-07T06:39:17.5982810Z * [new branch] gh/xuanzhang816/25/base -> origin/gh/xuanzhang816/25/base 2025-09-07T06:39:17.5982887Z * [new branch] gh/xuanzhang816/25/head -> origin/gh/xuanzhang816/25/head 2025-09-07T06:39:17.5982965Z * [new branch] gh/xuanzhang816/25/orig -> origin/gh/xuanzhang816/25/orig 2025-09-07T06:39:17.5983043Z * [new branch] gh/xuanzhang816/26/base -> origin/gh/xuanzhang816/26/base 2025-09-07T06:39:17.5983121Z * [new branch] gh/xuanzhang816/26/head -> origin/gh/xuanzhang816/26/head 2025-09-07T06:39:17.5983240Z * [new branch] gh/xuanzhang816/26/orig -> origin/gh/xuanzhang816/26/orig 2025-09-07T06:39:17.5983319Z * [new branch] gh/yanbing-j/11/base -> origin/gh/yanbing-j/11/base 2025-09-07T06:39:17.5983395Z * [new branch] gh/yanbing-j/11/head -> origin/gh/yanbing-j/11/head 2025-09-07T06:39:17.5983467Z * [new branch] gh/yanbing-j/11/orig -> origin/gh/yanbing-j/11/orig 2025-09-07T06:39:17.5983540Z * [new branch] gh/yanbing-j/12/base -> origin/gh/yanbing-j/12/base 2025-09-07T06:39:17.5983614Z * [new branch] gh/yanbing-j/12/head -> origin/gh/yanbing-j/12/head 2025-09-07T06:39:17.5983716Z * [new branch] gh/yanbing-j/12/orig -> origin/gh/yanbing-j/12/orig 2025-09-07T06:39:17.5983788Z * [new branch] gh/yanbing-j/13/base -> origin/gh/yanbing-j/13/base 2025-09-07T06:39:17.5983861Z * [new branch] gh/yanbing-j/13/head -> origin/gh/yanbing-j/13/head 2025-09-07T06:39:17.5983934Z * [new branch] gh/yanbing-j/13/orig -> origin/gh/yanbing-j/13/orig 2025-09-07T06:39:17.5984006Z * [new branch] gh/yanbing-j/14/base -> origin/gh/yanbing-j/14/base 2025-09-07T06:39:17.5985118Z * [new branch] gh/yanbing-j/14/head -> origin/gh/yanbing-j/14/head 2025-09-07T06:39:17.5985193Z * [new branch] gh/yanbing-j/14/orig -> origin/gh/yanbing-j/14/orig 2025-09-07T06:39:17.5985265Z * [new branch] gh/yanbing-j/15/base -> origin/gh/yanbing-j/15/base 2025-09-07T06:39:17.5985337Z * [new branch] gh/yanbing-j/15/head -> origin/gh/yanbing-j/15/head 2025-09-07T06:39:17.5985410Z * [new branch] gh/yanbing-j/15/orig -> origin/gh/yanbing-j/15/orig 2025-09-07T06:39:17.5985481Z * [new branch] gh/yanbing-j/18/base -> origin/gh/yanbing-j/18/base 2025-09-07T06:39:17.5985554Z * [new branch] gh/yanbing-j/18/head -> origin/gh/yanbing-j/18/head 2025-09-07T06:39:17.5985628Z * [new branch] gh/yanbing-j/18/orig -> origin/gh/yanbing-j/18/orig 2025-09-07T06:39:17.5985699Z * [new branch] gh/yanbing-j/19/base -> origin/gh/yanbing-j/19/base 2025-09-07T06:39:17.5985772Z * [new branch] gh/yanbing-j/19/head -> origin/gh/yanbing-j/19/head 2025-09-07T06:39:17.5985843Z * [new branch] gh/yanbing-j/19/orig -> origin/gh/yanbing-j/19/orig 2025-09-07T06:39:17.5985915Z * [new branch] gh/yanbing-j/20/base -> origin/gh/yanbing-j/20/base 2025-09-07T06:39:17.5985988Z * [new branch] gh/yanbing-j/20/head -> origin/gh/yanbing-j/20/head 2025-09-07T06:39:17.5986062Z * [new branch] gh/yanbing-j/20/orig -> origin/gh/yanbing-j/20/orig 2025-09-07T06:39:17.5986134Z * [new branch] gh/yanbing-j/21/base -> origin/gh/yanbing-j/21/base 2025-09-07T06:39:17.5986207Z * [new branch] gh/yanbing-j/21/head -> origin/gh/yanbing-j/21/head 2025-09-07T06:39:17.5986279Z * [new branch] gh/yanbing-j/22/base -> origin/gh/yanbing-j/22/base 2025-09-07T06:39:17.5986351Z * [new branch] gh/yanbing-j/22/head -> origin/gh/yanbing-j/22/head 2025-09-07T06:39:17.5986423Z * [new branch] gh/yanbing-j/22/orig -> origin/gh/yanbing-j/22/orig 2025-09-07T06:39:17.5986589Z * [new branch] gh/yanbing-j/23/base -> origin/gh/yanbing-j/23/base 2025-09-07T06:39:17.5986662Z * [new branch] gh/yanbing-j/23/head -> origin/gh/yanbing-j/23/head 2025-09-07T06:39:17.5986733Z * [new branch] gh/yanbing-j/23/orig -> origin/gh/yanbing-j/23/orig 2025-09-07T06:39:17.5986808Z * [new branch] gh/yanbing-j/24/base -> origin/gh/yanbing-j/24/base 2025-09-07T06:39:17.5986879Z * [new branch] gh/yanbing-j/24/head -> origin/gh/yanbing-j/24/head 2025-09-07T06:39:17.5987012Z * [new branch] gh/yanbing-j/24/orig -> origin/gh/yanbing-j/24/orig 2025-09-07T06:39:17.5988156Z * [new branch] gh/yanbing-j/25/base -> origin/gh/yanbing-j/25/base 2025-09-07T06:39:17.5988228Z * [new branch] gh/yanbing-j/25/head -> origin/gh/yanbing-j/25/head 2025-09-07T06:39:17.5988299Z * [new branch] gh/yanbing-j/25/orig -> origin/gh/yanbing-j/25/orig 2025-09-07T06:39:17.5988371Z * [new branch] gh/yanbing-j/26/base -> origin/gh/yanbing-j/26/base 2025-09-07T06:39:17.5988443Z * [new branch] gh/yanbing-j/26/head -> origin/gh/yanbing-j/26/head 2025-09-07T06:39:17.5988514Z * [new branch] gh/yanbing-j/26/orig -> origin/gh/yanbing-j/26/orig 2025-09-07T06:39:17.5988641Z * [new branch] gh/yanbing-j/36/base -> origin/gh/yanbing-j/36/base 2025-09-07T06:39:17.5988713Z * [new branch] gh/yanbing-j/36/head -> origin/gh/yanbing-j/36/head 2025-09-07T06:39:17.5988786Z * [new branch] gh/yanbing-j/36/orig -> origin/gh/yanbing-j/36/orig 2025-09-07T06:39:17.5988859Z * [new branch] gh/yanbing-j/37/base -> origin/gh/yanbing-j/37/base 2025-09-07T06:39:17.5988930Z * [new branch] gh/yanbing-j/37/head -> origin/gh/yanbing-j/37/head 2025-09-07T06:39:17.5989002Z * [new branch] gh/yanbing-j/37/orig -> origin/gh/yanbing-j/37/orig 2025-09-07T06:39:17.5989077Z * [new branch] gh/yangw-dev/12/base -> origin/gh/yangw-dev/12/base 2025-09-07T06:39:17.5989149Z * [new branch] gh/yangw-dev/12/head -> origin/gh/yangw-dev/12/head 2025-09-07T06:39:17.5989222Z * [new branch] gh/yangw-dev/12/orig -> origin/gh/yangw-dev/12/orig 2025-09-07T06:39:17.5989296Z * [new branch] gh/yangw-dev/13/base -> origin/gh/yangw-dev/13/base 2025-09-07T06:39:17.5989367Z * [new branch] gh/yangw-dev/13/head -> origin/gh/yangw-dev/13/head 2025-09-07T06:39:17.5989441Z * [new branch] gh/yangw-dev/13/orig -> origin/gh/yangw-dev/13/orig 2025-09-07T06:39:17.5989514Z * [new branch] gh/yangw-dev/14/base -> origin/gh/yangw-dev/14/base 2025-09-07T06:39:17.5989585Z * [new branch] gh/yangw-dev/14/head -> origin/gh/yangw-dev/14/head 2025-09-07T06:39:17.5989657Z * [new branch] gh/yangw-dev/14/orig -> origin/gh/yangw-dev/14/orig 2025-09-07T06:39:17.5989728Z * [new branch] gh/yangw-dev/15/base -> origin/gh/yangw-dev/15/base 2025-09-07T06:39:17.5989801Z * [new branch] gh/yangw-dev/15/head -> origin/gh/yangw-dev/15/head 2025-09-07T06:39:17.5989873Z * [new branch] gh/yangw-dev/15/orig -> origin/gh/yangw-dev/15/orig 2025-09-07T06:39:17.5990992Z * [new branch] gh/yangw-dev/16/base -> origin/gh/yangw-dev/16/base 2025-09-07T06:39:17.5991068Z * [new branch] gh/yangw-dev/16/head -> origin/gh/yangw-dev/16/head 2025-09-07T06:39:17.5991140Z * [new branch] gh/yangw-dev/16/orig -> origin/gh/yangw-dev/16/orig 2025-09-07T06:39:17.5991212Z * [new branch] gh/yangw-dev/17/base -> origin/gh/yangw-dev/17/base 2025-09-07T06:39:17.5991285Z * [new branch] gh/yangw-dev/17/head -> origin/gh/yangw-dev/17/head 2025-09-07T06:39:17.5991357Z * [new branch] gh/yangw-dev/17/orig -> origin/gh/yangw-dev/17/orig 2025-09-07T06:39:17.5991428Z * [new branch] gh/yangw-dev/18/base -> origin/gh/yangw-dev/18/base 2025-09-07T06:39:17.5991501Z * [new branch] gh/yangw-dev/18/head -> origin/gh/yangw-dev/18/head 2025-09-07T06:39:17.5991574Z * [new branch] gh/yangw-dev/18/orig -> origin/gh/yangw-dev/18/orig 2025-09-07T06:39:17.5991645Z * [new branch] gh/yangw-dev/19/base -> origin/gh/yangw-dev/19/base 2025-09-07T06:39:17.5991764Z * [new branch] gh/yangw-dev/19/head -> origin/gh/yangw-dev/19/head 2025-09-07T06:39:17.5991836Z * [new branch] gh/yangw-dev/19/orig -> origin/gh/yangw-dev/19/orig 2025-09-07T06:39:17.5991910Z * [new branch] gh/yangw-dev/20/base -> origin/gh/yangw-dev/20/base 2025-09-07T06:39:17.5991983Z * [new branch] gh/yangw-dev/20/head -> origin/gh/yangw-dev/20/head 2025-09-07T06:39:17.5992054Z * [new branch] gh/yangw-dev/20/orig -> origin/gh/yangw-dev/20/orig 2025-09-07T06:39:17.5992126Z * [new branch] gh/yangw-dev/21/base -> origin/gh/yangw-dev/21/base 2025-09-07T06:39:17.5992199Z * [new branch] gh/yangw-dev/21/head -> origin/gh/yangw-dev/21/head 2025-09-07T06:39:17.5992302Z * [new branch] gh/yangw-dev/21/orig -> origin/gh/yangw-dev/21/orig 2025-09-07T06:39:17.5992374Z * [new branch] gh/yangw-dev/22/base -> origin/gh/yangw-dev/22/base 2025-09-07T06:39:17.5992448Z * [new branch] gh/yangw-dev/22/head -> origin/gh/yangw-dev/22/head 2025-09-07T06:39:17.5992520Z * [new branch] gh/yangw-dev/22/orig -> origin/gh/yangw-dev/22/orig 2025-09-07T06:39:17.5992591Z * [new branch] gh/yangw-dev/23/base -> origin/gh/yangw-dev/23/base 2025-09-07T06:39:17.5992664Z * [new branch] gh/yangw-dev/23/head -> origin/gh/yangw-dev/23/head 2025-09-07T06:39:17.5992737Z * [new branch] gh/yangw-dev/23/orig -> origin/gh/yangw-dev/23/orig 2025-09-07T06:39:17.5993853Z * [new branch] gh/yangw-dev/24/base -> origin/gh/yangw-dev/24/base 2025-09-07T06:39:17.5993932Z * [new branch] gh/yangw-dev/24/head -> origin/gh/yangw-dev/24/head 2025-09-07T06:39:17.5994005Z * [new branch] gh/yangw-dev/24/orig -> origin/gh/yangw-dev/24/orig 2025-09-07T06:39:17.5994077Z * [new branch] gh/yangw-dev/25/base -> origin/gh/yangw-dev/25/base 2025-09-07T06:39:17.5994151Z * [new branch] gh/yangw-dev/25/head -> origin/gh/yangw-dev/25/head 2025-09-07T06:39:17.5994222Z * [new branch] gh/yangw-dev/25/orig -> origin/gh/yangw-dev/25/orig 2025-09-07T06:39:17.5994294Z * [new branch] gh/yangw-dev/26/base -> origin/gh/yangw-dev/26/base 2025-09-07T06:39:17.5994367Z * [new branch] gh/yangw-dev/26/head -> origin/gh/yangw-dev/26/head 2025-09-07T06:39:17.5994439Z * [new branch] gh/yangw-dev/26/orig -> origin/gh/yangw-dev/26/orig 2025-09-07T06:39:17.5994510Z * [new branch] gh/yangw-dev/27/base -> origin/gh/yangw-dev/27/base 2025-09-07T06:39:17.5994583Z * [new branch] gh/yangw-dev/27/head -> origin/gh/yangw-dev/27/head 2025-09-07T06:39:17.5994656Z * [new branch] gh/yangw-dev/27/orig -> origin/gh/yangw-dev/27/orig 2025-09-07T06:39:17.5994728Z * [new branch] gh/ydwu4/233/base -> origin/gh/ydwu4/233/base 2025-09-07T06:39:17.5994799Z * [new branch] gh/ydwu4/233/head -> origin/gh/ydwu4/233/head 2025-09-07T06:39:17.5994869Z * [new branch] gh/ydwu4/233/orig -> origin/gh/ydwu4/233/orig 2025-09-07T06:39:17.5994938Z * [new branch] gh/ydwu4/246/base -> origin/gh/ydwu4/246/base 2025-09-07T06:39:17.5995006Z * [new branch] gh/ydwu4/246/head -> origin/gh/ydwu4/246/head 2025-09-07T06:39:17.5995074Z * [new branch] gh/ydwu4/246/orig -> origin/gh/ydwu4/246/orig 2025-09-07T06:39:17.5995142Z * [new branch] gh/ydwu4/253/base -> origin/gh/ydwu4/253/base 2025-09-07T06:39:17.5995211Z * [new branch] gh/ydwu4/253/head -> origin/gh/ydwu4/253/head 2025-09-07T06:39:17.5995280Z * [new branch] gh/ydwu4/253/orig -> origin/gh/ydwu4/253/orig 2025-09-07T06:39:17.5995387Z * [new branch] gh/ydwu4/255/base -> origin/gh/ydwu4/255/base 2025-09-07T06:39:17.5995456Z * [new branch] gh/ydwu4/255/head -> origin/gh/ydwu4/255/head 2025-09-07T06:39:17.5995525Z * [new branch] gh/ydwu4/255/orig -> origin/gh/ydwu4/255/orig 2025-09-07T06:39:17.5996710Z * [new branch] gh/ydwu4/259/base -> origin/gh/ydwu4/259/base 2025-09-07T06:39:17.5996783Z * [new branch] gh/ydwu4/259/head -> origin/gh/ydwu4/259/head 2025-09-07T06:39:17.5996852Z * [new branch] gh/ydwu4/259/orig -> origin/gh/ydwu4/259/orig 2025-09-07T06:39:17.5996920Z * [new branch] gh/ydwu4/262/base -> origin/gh/ydwu4/262/base 2025-09-07T06:39:17.5997053Z * [new branch] gh/ydwu4/262/head -> origin/gh/ydwu4/262/head 2025-09-07T06:39:17.5997122Z * [new branch] gh/ydwu4/262/orig -> origin/gh/ydwu4/262/orig 2025-09-07T06:39:17.5997191Z * [new branch] gh/ydwu4/263/base -> origin/gh/ydwu4/263/base 2025-09-07T06:39:17.5997258Z * [new branch] gh/ydwu4/263/head -> origin/gh/ydwu4/263/head 2025-09-07T06:39:17.5997327Z * [new branch] gh/ydwu4/263/orig -> origin/gh/ydwu4/263/orig 2025-09-07T06:39:17.5997394Z * [new branch] gh/ydwu4/269/base -> origin/gh/ydwu4/269/base 2025-09-07T06:39:17.5997462Z * [new branch] gh/ydwu4/269/head -> origin/gh/ydwu4/269/head 2025-09-07T06:39:17.5997531Z * [new branch] gh/ydwu4/269/orig -> origin/gh/ydwu4/269/orig 2025-09-07T06:39:17.5997598Z * [new branch] gh/ydwu4/270/base -> origin/gh/ydwu4/270/base 2025-09-07T06:39:17.5997667Z * [new branch] gh/ydwu4/270/head -> origin/gh/ydwu4/270/head 2025-09-07T06:39:17.5997735Z * [new branch] gh/ydwu4/270/orig -> origin/gh/ydwu4/270/orig 2025-09-07T06:39:17.5997805Z * [new branch] gh/ydwu4/272/base -> origin/gh/ydwu4/272/base 2025-09-07T06:39:17.5997873Z * [new branch] gh/ydwu4/272/head -> origin/gh/ydwu4/272/head 2025-09-07T06:39:17.5997942Z * [new branch] gh/ydwu4/272/orig -> origin/gh/ydwu4/272/orig 2025-09-07T06:39:17.5998098Z * [new branch] gh/ydwu4/275/base -> origin/gh/ydwu4/275/base 2025-09-07T06:39:17.5998167Z * [new branch] gh/ydwu4/275/head -> origin/gh/ydwu4/275/head 2025-09-07T06:39:17.5998234Z * [new branch] gh/ydwu4/275/orig -> origin/gh/ydwu4/275/orig 2025-09-07T06:39:17.5998303Z * [new branch] gh/ydwu4/276/base -> origin/gh/ydwu4/276/base 2025-09-07T06:39:17.5998372Z * [new branch] gh/ydwu4/276/head -> origin/gh/ydwu4/276/head 2025-09-07T06:39:17.5998441Z * [new branch] gh/ydwu4/276/orig -> origin/gh/ydwu4/276/orig 2025-09-07T06:39:17.5999565Z * [new branch] gh/ydwu4/279/base -> origin/gh/ydwu4/279/base 2025-09-07T06:39:17.5999636Z * [new branch] gh/ydwu4/279/head -> origin/gh/ydwu4/279/head 2025-09-07T06:39:17.5999704Z * [new branch] gh/ydwu4/279/orig -> origin/gh/ydwu4/279/orig 2025-09-07T06:39:17.5999773Z * [new branch] gh/ydwu4/283/base -> origin/gh/ydwu4/283/base 2025-09-07T06:39:17.5999840Z * [new branch] gh/ydwu4/283/head -> origin/gh/ydwu4/283/head 2025-09-07T06:39:17.5999908Z * [new branch] gh/ydwu4/283/orig -> origin/gh/ydwu4/283/orig 2025-09-07T06:39:17.5999977Z * [new branch] gh/ydwu4/289/base -> origin/gh/ydwu4/289/base 2025-09-07T06:39:17.6000046Z * [new branch] gh/ydwu4/289/head -> origin/gh/ydwu4/289/head 2025-09-07T06:39:17.6000114Z * [new branch] gh/ydwu4/289/orig -> origin/gh/ydwu4/289/orig 2025-09-07T06:39:17.6000240Z * [new branch] gh/ydwu4/290/base -> origin/gh/ydwu4/290/base 2025-09-07T06:39:17.6000309Z * [new branch] gh/ydwu4/290/head -> origin/gh/ydwu4/290/head 2025-09-07T06:39:17.6000376Z * [new branch] gh/ydwu4/290/orig -> origin/gh/ydwu4/290/orig 2025-09-07T06:39:17.6000445Z * [new branch] gh/ydwu4/291/base -> origin/gh/ydwu4/291/base 2025-09-07T06:39:17.6000512Z * [new branch] gh/ydwu4/291/head -> origin/gh/ydwu4/291/head 2025-09-07T06:39:17.6000581Z * [new branch] gh/ydwu4/291/orig -> origin/gh/ydwu4/291/orig 2025-09-07T06:39:17.6000648Z * [new branch] gh/ydwu4/292/base -> origin/gh/ydwu4/292/base 2025-09-07T06:39:17.6000746Z * [new branch] gh/ydwu4/292/head -> origin/gh/ydwu4/292/head 2025-09-07T06:39:17.6000814Z * [new branch] gh/ydwu4/292/orig -> origin/gh/ydwu4/292/orig 2025-09-07T06:39:17.6000883Z * [new branch] gh/ydwu4/293/base -> origin/gh/ydwu4/293/base 2025-09-07T06:39:17.6000952Z * [new branch] gh/ydwu4/293/head -> origin/gh/ydwu4/293/head 2025-09-07T06:39:17.6001020Z * [new branch] gh/ydwu4/293/orig -> origin/gh/ydwu4/293/orig 2025-09-07T06:39:17.6001087Z * [new branch] gh/ydwu4/294/base -> origin/gh/ydwu4/294/base 2025-09-07T06:39:17.6001156Z * [new branch] gh/ydwu4/294/head -> origin/gh/ydwu4/294/head 2025-09-07T06:39:17.6001224Z * [new branch] gh/ydwu4/294/orig -> origin/gh/ydwu4/294/orig 2025-09-07T06:39:17.6001291Z * [new branch] gh/ydwu4/295/base -> origin/gh/ydwu4/295/base 2025-09-07T06:39:17.6002424Z * [new branch] gh/ydwu4/295/head -> origin/gh/ydwu4/295/head 2025-09-07T06:39:17.6002494Z * [new branch] gh/ydwu4/295/orig -> origin/gh/ydwu4/295/orig 2025-09-07T06:39:17.6002563Z * [new branch] gh/ydwu4/296/base -> origin/gh/ydwu4/296/base 2025-09-07T06:39:17.6002631Z * [new branch] gh/ydwu4/296/head -> origin/gh/ydwu4/296/head 2025-09-07T06:39:17.6002699Z * [new branch] gh/ydwu4/296/orig -> origin/gh/ydwu4/296/orig 2025-09-07T06:39:17.6002766Z * [new branch] gh/ydwu4/300/base -> origin/gh/ydwu4/300/base 2025-09-07T06:39:17.6002834Z * [new branch] gh/ydwu4/300/head -> origin/gh/ydwu4/300/head 2025-09-07T06:39:17.6002902Z * [new branch] gh/ydwu4/300/orig -> origin/gh/ydwu4/300/orig 2025-09-07T06:39:17.6002969Z * [new branch] gh/ydwu4/301/base -> origin/gh/ydwu4/301/base 2025-09-07T06:39:17.6003039Z * [new branch] gh/ydwu4/301/head -> origin/gh/ydwu4/301/head 2025-09-07T06:39:17.6003108Z * [new branch] gh/ydwu4/301/orig -> origin/gh/ydwu4/301/orig 2025-09-07T06:39:17.6003177Z * [new branch] gh/ydwu4/302/base -> origin/gh/ydwu4/302/base 2025-09-07T06:39:17.6003246Z * [new branch] gh/ydwu4/302/head -> origin/gh/ydwu4/302/head 2025-09-07T06:39:17.6003313Z * [new branch] gh/ydwu4/302/orig -> origin/gh/ydwu4/302/orig 2025-09-07T06:39:17.6003381Z * [new branch] gh/ydwu4/303/base -> origin/gh/ydwu4/303/base 2025-09-07T06:39:17.6003451Z * [new branch] gh/ydwu4/303/head -> origin/gh/ydwu4/303/head 2025-09-07T06:39:17.6003518Z * [new branch] gh/ydwu4/303/orig -> origin/gh/ydwu4/303/orig 2025-09-07T06:39:17.6003586Z * [new branch] gh/ydwu4/304/base -> origin/gh/ydwu4/304/base 2025-09-07T06:39:17.6003655Z * [new branch] gh/ydwu4/304/head -> origin/gh/ydwu4/304/head 2025-09-07T06:39:17.6003723Z * [new branch] gh/ydwu4/304/orig -> origin/gh/ydwu4/304/orig 2025-09-07T06:39:17.6003828Z * [new branch] gh/ydwu4/305/base -> origin/gh/ydwu4/305/base 2025-09-07T06:39:17.6003897Z * [new branch] gh/ydwu4/305/head -> origin/gh/ydwu4/305/head 2025-09-07T06:39:17.6003965Z * [new branch] gh/ydwu4/305/orig -> origin/gh/ydwu4/305/orig 2025-09-07T06:39:17.6004033Z * [new branch] gh/ydwu4/306/base -> origin/gh/ydwu4/306/base 2025-09-07T06:39:17.6005132Z * [new branch] gh/ydwu4/306/head -> origin/gh/ydwu4/306/head 2025-09-07T06:39:17.6005204Z * [new branch] gh/ydwu4/306/orig -> origin/gh/ydwu4/306/orig 2025-09-07T06:39:17.6005272Z * [new branch] gh/ydwu4/307/base -> origin/gh/ydwu4/307/base 2025-09-07T06:39:17.6005384Z * [new branch] gh/ydwu4/307/head -> origin/gh/ydwu4/307/head 2025-09-07T06:39:17.6005453Z * [new branch] gh/ydwu4/307/orig -> origin/gh/ydwu4/307/orig 2025-09-07T06:39:17.6005522Z * [new branch] gh/ydwu4/308/base -> origin/gh/ydwu4/308/base 2025-09-07T06:39:17.6005589Z * [new branch] gh/ydwu4/308/head -> origin/gh/ydwu4/308/head 2025-09-07T06:39:17.6005658Z * [new branch] gh/ydwu4/308/orig -> origin/gh/ydwu4/308/orig 2025-09-07T06:39:17.6005726Z * [new branch] gh/ydwu4/309/base -> origin/gh/ydwu4/309/base 2025-09-07T06:39:17.6005794Z * [new branch] gh/ydwu4/309/head -> origin/gh/ydwu4/309/head 2025-09-07T06:39:17.6005863Z * [new branch] gh/ydwu4/309/orig -> origin/gh/ydwu4/309/orig 2025-09-07T06:39:17.6005930Z * [new branch] gh/ydwu4/310/base -> origin/gh/ydwu4/310/base 2025-09-07T06:39:17.6005999Z * [new branch] gh/ydwu4/310/head -> origin/gh/ydwu4/310/head 2025-09-07T06:39:17.6006068Z * [new branch] gh/ydwu4/310/orig -> origin/gh/ydwu4/310/orig 2025-09-07T06:39:17.6006136Z * [new branch] gh/ydwu4/311/base -> origin/gh/ydwu4/311/base 2025-09-07T06:39:17.6006204Z * [new branch] gh/ydwu4/311/head -> origin/gh/ydwu4/311/head 2025-09-07T06:39:17.6006273Z * [new branch] gh/ydwu4/311/orig -> origin/gh/ydwu4/311/orig 2025-09-07T06:39:17.6006341Z * [new branch] gh/ydwu4/312/base -> origin/gh/ydwu4/312/base 2025-09-07T06:39:17.6006409Z * [new branch] gh/ydwu4/312/head -> origin/gh/ydwu4/312/head 2025-09-07T06:39:17.6006476Z * [new branch] gh/ydwu4/312/orig -> origin/gh/ydwu4/312/orig 2025-09-07T06:39:17.6006612Z * [new branch] gh/ydwu4/313/base -> origin/gh/ydwu4/313/base 2025-09-07T06:39:17.6006681Z * [new branch] gh/ydwu4/313/head -> origin/gh/ydwu4/313/head 2025-09-07T06:39:17.6006748Z * [new branch] gh/ydwu4/313/orig -> origin/gh/ydwu4/313/orig 2025-09-07T06:39:17.6006819Z * [new branch] gh/ydwu4/314/base -> origin/gh/ydwu4/314/base 2025-09-07T06:39:17.6007951Z * [new branch] gh/ydwu4/314/head -> origin/gh/ydwu4/314/head 2025-09-07T06:39:17.6008022Z * [new branch] gh/ydwu4/314/orig -> origin/gh/ydwu4/314/orig 2025-09-07T06:39:17.6008091Z * [new branch] gh/ydwu4/315/base -> origin/gh/ydwu4/315/base 2025-09-07T06:39:17.6008160Z * [new branch] gh/ydwu4/315/head -> origin/gh/ydwu4/315/head 2025-09-07T06:39:17.6008228Z * [new branch] gh/ydwu4/315/orig -> origin/gh/ydwu4/315/orig 2025-09-07T06:39:17.6008297Z * [new branch] gh/ydwu4/316/base -> origin/gh/ydwu4/316/base 2025-09-07T06:39:17.6008367Z * [new branch] gh/ydwu4/316/head -> origin/gh/ydwu4/316/head 2025-09-07T06:39:17.6008435Z * [new branch] gh/ydwu4/316/orig -> origin/gh/ydwu4/316/orig 2025-09-07T06:39:17.6008565Z * [new branch] gh/ydwu4/317/base -> origin/gh/ydwu4/317/base 2025-09-07T06:39:17.6008634Z * [new branch] gh/ydwu4/317/head -> origin/gh/ydwu4/317/head 2025-09-07T06:39:17.6008702Z * [new branch] gh/ydwu4/317/orig -> origin/gh/ydwu4/317/orig 2025-09-07T06:39:17.6008771Z * [new branch] gh/ydwu4/318/base -> origin/gh/ydwu4/318/base 2025-09-07T06:39:17.6008839Z * [new branch] gh/ydwu4/318/head -> origin/gh/ydwu4/318/head 2025-09-07T06:39:17.6008907Z * [new branch] gh/ydwu4/318/orig -> origin/gh/ydwu4/318/orig 2025-09-07T06:39:17.6008977Z * [new branch] gh/ydwu4/319/base -> origin/gh/ydwu4/319/base 2025-09-07T06:39:17.6009090Z * [new branch] gh/ydwu4/319/head -> origin/gh/ydwu4/319/head 2025-09-07T06:39:17.6009158Z * [new branch] gh/ydwu4/319/orig -> origin/gh/ydwu4/319/orig 2025-09-07T06:39:17.6009229Z * [new branch] gh/ydwu4/320/base -> origin/gh/ydwu4/320/base 2025-09-07T06:39:17.6009298Z * [new branch] gh/ydwu4/320/head -> origin/gh/ydwu4/320/head 2025-09-07T06:39:17.6009366Z * [new branch] gh/ydwu4/320/orig -> origin/gh/ydwu4/320/orig 2025-09-07T06:39:17.6009435Z * [new branch] gh/ydwu4/321/base -> origin/gh/ydwu4/321/base 2025-09-07T06:39:17.6009503Z * [new branch] gh/ydwu4/321/head -> origin/gh/ydwu4/321/head 2025-09-07T06:39:17.6009571Z * [new branch] gh/ydwu4/321/orig -> origin/gh/ydwu4/321/orig 2025-09-07T06:39:17.6009639Z * [new branch] gh/ydwu4/322/base -> origin/gh/ydwu4/322/base 2025-09-07T06:39:17.6010763Z * [new branch] gh/ydwu4/322/head -> origin/gh/ydwu4/322/head 2025-09-07T06:39:17.6010834Z * [new branch] gh/ydwu4/322/orig -> origin/gh/ydwu4/322/orig 2025-09-07T06:39:17.6010905Z * [new branch] gh/ydwu4/323/base -> origin/gh/ydwu4/323/base 2025-09-07T06:39:17.6010973Z * [new branch] gh/ydwu4/323/head -> origin/gh/ydwu4/323/head 2025-09-07T06:39:17.6011041Z * [new branch] gh/ydwu4/323/orig -> origin/gh/ydwu4/323/orig 2025-09-07T06:39:17.6011109Z * [new branch] gh/ydwu4/324/base -> origin/gh/ydwu4/324/base 2025-09-07T06:39:17.6011179Z * [new branch] gh/ydwu4/324/head -> origin/gh/ydwu4/324/head 2025-09-07T06:39:17.6011248Z * [new branch] gh/ydwu4/324/orig -> origin/gh/ydwu4/324/orig 2025-09-07T06:39:17.6011315Z * [new branch] gh/yf225/133/base -> origin/gh/yf225/133/base 2025-09-07T06:39:17.6011385Z * [new branch] gh/yf225/133/head -> origin/gh/yf225/133/head 2025-09-07T06:39:17.6011452Z * [new branch] gh/yf225/171/base -> origin/gh/yf225/171/base 2025-09-07T06:39:17.6011521Z * [new branch] gh/yf225/171/head -> origin/gh/yf225/171/head 2025-09-07T06:39:17.6011589Z * [new branch] gh/yf225/171/orig -> origin/gh/yf225/171/orig 2025-09-07T06:39:17.6011657Z * [new branch] gh/yf225/172/base -> origin/gh/yf225/172/base 2025-09-07T06:39:17.6011724Z * [new branch] gh/yf225/172/head -> origin/gh/yf225/172/head 2025-09-07T06:39:17.6011793Z * [new branch] gh/yf225/172/orig -> origin/gh/yf225/172/orig 2025-09-07T06:39:17.6011863Z * [new branch] gh/yf225/93/base -> origin/gh/yf225/93/base 2025-09-07T06:39:17.6011932Z * [new branch] gh/yf225/93/head -> origin/gh/yf225/93/head 2025-09-07T06:39:17.6012012Z * [new branch] gh/yifuwang/152/base -> origin/gh/yifuwang/152/base 2025-09-07T06:39:17.6012090Z * [new branch] gh/yifuwang/152/head -> origin/gh/yifuwang/152/head 2025-09-07T06:39:17.6012199Z * [new branch] gh/yifuwang/152/orig -> origin/gh/yifuwang/152/orig 2025-09-07T06:39:17.6012276Z * [new branch] gh/yifuwang/195/base -> origin/gh/yifuwang/195/base 2025-09-07T06:39:17.6012352Z * [new branch] gh/yifuwang/195/head -> origin/gh/yifuwang/195/head 2025-09-07T06:39:17.6012425Z * [new branch] gh/yifuwang/195/orig -> origin/gh/yifuwang/195/orig 2025-09-07T06:39:17.6013704Z * [new branch] gh/yiming0416/1/base -> origin/gh/yiming0416/1/base 2025-09-07T06:39:17.6013781Z * [new branch] gh/yiming0416/1/head -> origin/gh/yiming0416/1/head 2025-09-07T06:39:17.6013893Z * [new branch] gh/yiming0416/2/base -> origin/gh/yiming0416/2/base 2025-09-07T06:39:17.6013967Z * [new branch] gh/yiming0416/2/head -> origin/gh/yiming0416/2/head 2025-09-07T06:39:17.6014042Z * [new branch] gh/ysiraichi/79/base -> origin/gh/ysiraichi/79/base 2025-09-07T06:39:17.6014118Z * [new branch] gh/ysiraichi/79/head -> origin/gh/ysiraichi/79/head 2025-09-07T06:39:17.6014192Z * [new branch] gh/ysiraichi/79/orig -> origin/gh/ysiraichi/79/orig 2025-09-07T06:39:17.6014266Z * [new branch] gh/ysiraichi/88/base -> origin/gh/ysiraichi/88/base 2025-09-07T06:39:17.6014339Z * [new branch] gh/ysiraichi/88/head -> origin/gh/ysiraichi/88/head 2025-09-07T06:39:17.6014413Z * [new branch] gh/ysiraichi/88/orig -> origin/gh/ysiraichi/88/orig 2025-09-07T06:39:17.6014487Z * [new branch] gh/zhxchen17/25/base -> origin/gh/zhxchen17/25/base 2025-09-07T06:39:17.6014561Z * [new branch] gh/zhxchen17/25/head -> origin/gh/zhxchen17/25/head 2025-09-07T06:39:17.6014634Z * [new branch] gh/zhxchen17/25/orig -> origin/gh/zhxchen17/25/orig 2025-09-07T06:39:17.6014708Z * [new branch] gh/zhxchen17/31/base -> origin/gh/zhxchen17/31/base 2025-09-07T06:39:17.6014783Z * [new branch] gh/zhxchen17/31/head -> origin/gh/zhxchen17/31/head 2025-09-07T06:39:17.6014856Z * [new branch] gh/zhxchen17/31/orig -> origin/gh/zhxchen17/31/orig 2025-09-07T06:39:17.6014930Z * [new branch] gh/zhxchen17/34/base -> origin/gh/zhxchen17/34/base 2025-09-07T06:39:17.6015006Z * [new branch] gh/zhxchen17/34/head -> origin/gh/zhxchen17/34/head 2025-09-07T06:39:17.6015080Z * [new branch] gh/zhxchen17/35/base -> origin/gh/zhxchen17/35/base 2025-09-07T06:39:17.6015154Z * [new branch] gh/zhxchen17/35/head -> origin/gh/zhxchen17/35/head 2025-09-07T06:39:17.6015228Z * [new branch] gh/zhxchen17/37/base -> origin/gh/zhxchen17/37/base 2025-09-07T06:39:17.6015301Z * [new branch] gh/zhxchen17/37/head -> origin/gh/zhxchen17/37/head 2025-09-07T06:39:17.6015376Z * [new branch] gh/zhxchen17/37/orig -> origin/gh/zhxchen17/37/orig 2025-09-07T06:39:17.6015450Z * [new branch] gh/zhxchen17/38/base -> origin/gh/zhxchen17/38/base 2025-09-07T06:39:17.6015522Z * [new branch] gh/zhxchen17/38/head -> origin/gh/zhxchen17/38/head 2025-09-07T06:39:17.6016737Z * [new branch] gh/zhxchen17/38/orig -> origin/gh/zhxchen17/38/orig 2025-09-07T06:39:17.6016812Z * [new branch] gh/zhxchen17/39/base -> origin/gh/zhxchen17/39/base 2025-09-07T06:39:17.6016885Z * [new branch] gh/zhxchen17/39/head -> origin/gh/zhxchen17/39/head 2025-09-07T06:39:17.6016959Z * [new branch] gh/zhxchen17/39/orig -> origin/gh/zhxchen17/39/orig 2025-09-07T06:39:17.6017033Z * [new branch] gh/zhxchen17/40/base -> origin/gh/zhxchen17/40/base 2025-09-07T06:39:17.6017106Z * [new branch] gh/zhxchen17/40/head -> origin/gh/zhxchen17/40/head 2025-09-07T06:39:17.6017233Z * [new branch] gh/zhxchen17/40/orig -> origin/gh/zhxchen17/40/orig 2025-09-07T06:39:17.6017307Z * [new branch] gh/zhxchen17/41/base -> origin/gh/zhxchen17/41/base 2025-09-07T06:39:17.6017380Z * [new branch] gh/zhxchen17/41/head -> origin/gh/zhxchen17/41/head 2025-09-07T06:39:17.6017454Z * [new branch] gh/zhxchen17/41/orig -> origin/gh/zhxchen17/41/orig 2025-09-07T06:39:17.6017526Z * [new branch] gh/zhxchen17/42/base -> origin/gh/zhxchen17/42/base 2025-09-07T06:39:17.6017599Z * [new branch] gh/zhxchen17/42/head -> origin/gh/zhxchen17/42/head 2025-09-07T06:39:17.6017720Z * [new branch] gh/zhxchen17/42/orig -> origin/gh/zhxchen17/42/orig 2025-09-07T06:39:17.6017793Z * [new branch] gh/zhxchen17/43/base -> origin/gh/zhxchen17/43/base 2025-09-07T06:39:17.6017866Z * [new branch] gh/zhxchen17/43/head -> origin/gh/zhxchen17/43/head 2025-09-07T06:39:17.6017940Z * [new branch] gh/zhxchen17/43/orig -> origin/gh/zhxchen17/43/orig 2025-09-07T06:39:17.6018014Z * [new branch] gh/zhxchen17/44/base -> origin/gh/zhxchen17/44/base 2025-09-07T06:39:17.6018086Z * [new branch] gh/zhxchen17/44/head -> origin/gh/zhxchen17/44/head 2025-09-07T06:39:17.6018159Z * [new branch] gh/zhxchen17/44/orig -> origin/gh/zhxchen17/44/orig 2025-09-07T06:39:17.6018232Z * [new branch] gh/zhxchen17/45/base -> origin/gh/zhxchen17/45/base 2025-09-07T06:39:17.6018306Z * [new branch] gh/zhxchen17/45/head -> origin/gh/zhxchen17/45/head 2025-09-07T06:39:17.6018381Z * [new branch] gh/zhxchen17/45/orig -> origin/gh/zhxchen17/45/orig 2025-09-07T06:39:17.6018453Z * [new branch] gh/zklaus/10/base -> origin/gh/zklaus/10/base 2025-09-07T06:39:17.6018523Z * [new branch] gh/zklaus/10/head -> origin/gh/zklaus/10/head 2025-09-07T06:39:17.6019654Z * [new branch] gh/zklaus/10/orig -> origin/gh/zklaus/10/orig 2025-09-07T06:39:17.6019728Z * [new branch] gh/zklaus/11/base -> origin/gh/zklaus/11/base 2025-09-07T06:39:17.6019798Z * [new branch] gh/zklaus/11/head -> origin/gh/zklaus/11/head 2025-09-07T06:39:17.6019866Z * [new branch] gh/zklaus/11/orig -> origin/gh/zklaus/11/orig 2025-09-07T06:39:17.6019937Z * [new branch] gh/zklaus/12/base -> origin/gh/zklaus/12/base 2025-09-07T06:39:17.6020005Z * [new branch] gh/zklaus/12/head -> origin/gh/zklaus/12/head 2025-09-07T06:39:17.6020076Z * [new branch] gh/zklaus/12/orig -> origin/gh/zklaus/12/orig 2025-09-07T06:39:17.6020145Z * [new branch] gh/zklaus/14/base -> origin/gh/zklaus/14/base 2025-09-07T06:39:17.6020214Z * [new branch] gh/zklaus/14/head -> origin/gh/zklaus/14/head 2025-09-07T06:39:17.6020284Z * [new branch] gh/zklaus/14/orig -> origin/gh/zklaus/14/orig 2025-09-07T06:39:17.6020353Z * [new branch] gh/zklaus/15/base -> origin/gh/zklaus/15/base 2025-09-07T06:39:17.6020422Z * [new branch] gh/zklaus/15/head -> origin/gh/zklaus/15/head 2025-09-07T06:39:17.6020491Z * [new branch] gh/zklaus/15/orig -> origin/gh/zklaus/15/orig 2025-09-07T06:39:17.6020561Z * [new branch] gh/zklaus/16/base -> origin/gh/zklaus/16/base 2025-09-07T06:39:17.6020629Z * [new branch] gh/zklaus/16/head -> origin/gh/zklaus/16/head 2025-09-07T06:39:17.6020700Z * [new branch] gh/zklaus/16/orig -> origin/gh/zklaus/16/orig 2025-09-07T06:39:17.6020768Z * [new branch] gh/zklaus/17/base -> origin/gh/zklaus/17/base 2025-09-07T06:39:17.6020837Z * [new branch] gh/zklaus/17/head -> origin/gh/zklaus/17/head 2025-09-07T06:39:17.6020946Z * [new branch] gh/zklaus/17/orig -> origin/gh/zklaus/17/orig 2025-09-07T06:39:17.6021015Z * [new branch] gh/zklaus/18/base -> origin/gh/zklaus/18/base 2025-09-07T06:39:17.6021085Z * [new branch] gh/zklaus/18/head -> origin/gh/zklaus/18/head 2025-09-07T06:39:17.6021154Z * [new branch] gh/zklaus/18/orig -> origin/gh/zklaus/18/orig 2025-09-07T06:39:17.6021223Z * [new branch] gh/zklaus/19/base -> origin/gh/zklaus/19/base 2025-09-07T06:39:17.6021293Z * [new branch] gh/zklaus/19/head -> origin/gh/zklaus/19/head 2025-09-07T06:39:17.6022443Z * [new branch] gh/zklaus/19/orig -> origin/gh/zklaus/19/orig 2025-09-07T06:39:17.6022514Z * [new branch] gh/zklaus/20/base -> origin/gh/zklaus/20/base 2025-09-07T06:39:17.6022584Z * [new branch] gh/zklaus/20/head -> origin/gh/zklaus/20/head 2025-09-07T06:39:17.6022654Z * [new branch] gh/zklaus/20/orig -> origin/gh/zklaus/20/orig 2025-09-07T06:39:17.6022725Z * [new branch] gh/zklaus/7/base -> origin/gh/zklaus/7/base 2025-09-07T06:39:17.6022796Z * [new branch] gh/zklaus/7/head -> origin/gh/zklaus/7/head 2025-09-07T06:39:17.6022865Z * [new branch] gh/zklaus/7/orig -> origin/gh/zklaus/7/orig 2025-09-07T06:39:17.6022933Z * [new branch] gh/zklaus/9/base -> origin/gh/zklaus/9/base 2025-09-07T06:39:17.6023002Z * [new branch] gh/zklaus/9/head -> origin/gh/zklaus/9/head 2025-09-07T06:39:17.6023070Z * [new branch] gh/zklaus/9/orig -> origin/gh/zklaus/9/orig 2025-09-07T06:39:17.6023146Z * [new branch] gh/zou3519/1175/base -> origin/gh/zou3519/1175/base 2025-09-07T06:39:17.6023221Z * [new branch] gh/zou3519/1175/head -> origin/gh/zou3519/1175/head 2025-09-07T06:39:17.6023296Z * [new branch] gh/zou3519/1175/orig -> origin/gh/zou3519/1175/orig 2025-09-07T06:39:17.6023368Z * [new branch] gh/zou3519/1177/base -> origin/gh/zou3519/1177/base 2025-09-07T06:39:17.6023440Z * [new branch] gh/zou3519/1177/head -> origin/gh/zou3519/1177/head 2025-09-07T06:39:17.6023511Z * [new branch] gh/zou3519/1177/orig -> origin/gh/zou3519/1177/orig 2025-09-07T06:39:17.6023583Z * [new branch] gh/zou3519/1191/base -> origin/gh/zou3519/1191/base 2025-09-07T06:39:17.6023655Z * [new branch] gh/zou3519/1191/head -> origin/gh/zou3519/1191/head 2025-09-07T06:39:17.6023727Z * [new branch] gh/zou3519/1191/orig -> origin/gh/zou3519/1191/orig 2025-09-07T06:39:17.6023798Z * [new branch] gh/zou3519/1192/base -> origin/gh/zou3519/1192/base 2025-09-07T06:39:17.6023870Z * [new branch] gh/zou3519/1192/head -> origin/gh/zou3519/1192/head 2025-09-07T06:39:17.6023942Z * [new branch] gh/zou3519/1192/orig -> origin/gh/zou3519/1192/orig 2025-09-07T06:39:17.6024013Z * [new branch] gh/zou3519/1193/base -> origin/gh/zou3519/1193/base 2025-09-07T06:39:17.6024084Z * [new branch] gh/zou3519/1193/head -> origin/gh/zou3519/1193/head 2025-09-07T06:39:17.6025194Z * [new branch] gh/zou3519/1193/orig -> origin/gh/zou3519/1193/orig 2025-09-07T06:39:17.6025268Z * [new branch] gh/zou3519/1194/base -> origin/gh/zou3519/1194/base 2025-09-07T06:39:17.6025339Z * [new branch] gh/zou3519/1194/head -> origin/gh/zou3519/1194/head 2025-09-07T06:39:17.6025413Z * [new branch] gh/zou3519/1194/orig -> origin/gh/zou3519/1194/orig 2025-09-07T06:39:17.6025484Z * [new branch] gh/zou3519/1195/base -> origin/gh/zou3519/1195/base 2025-09-07T06:39:17.6025592Z * [new branch] gh/zou3519/1195/head -> origin/gh/zou3519/1195/head 2025-09-07T06:39:17.6025665Z * [new branch] gh/zou3519/1195/orig -> origin/gh/zou3519/1195/orig 2025-09-07T06:39:17.6025736Z * [new branch] gh/zou3519/1196/base -> origin/gh/zou3519/1196/base 2025-09-07T06:39:17.6025807Z * [new branch] gh/zou3519/1196/head -> origin/gh/zou3519/1196/head 2025-09-07T06:39:17.6025880Z * [new branch] gh/zou3519/1196/orig -> origin/gh/zou3519/1196/orig 2025-09-07T06:39:17.6025950Z * [new branch] gh/zou3519/1197/base -> origin/gh/zou3519/1197/base 2025-09-07T06:39:17.6026050Z * [new branch] gh/zou3519/1197/head -> origin/gh/zou3519/1197/head 2025-09-07T06:39:17.6026121Z * [new branch] gh/zou3519/1197/orig -> origin/gh/zou3519/1197/orig 2025-09-07T06:39:17.6026191Z * [new branch] gh/zpcore/1/base -> origin/gh/zpcore/1/base 2025-09-07T06:39:17.6026262Z * [new branch] gh/zpcore/1/head -> origin/gh/zpcore/1/head 2025-09-07T06:39:17.6026334Z * [new branch] gh/zpcore/10/base -> origin/gh/zpcore/10/base 2025-09-07T06:39:17.6026405Z * [new branch] gh/zpcore/10/head -> origin/gh/zpcore/10/head 2025-09-07T06:39:17.6026475Z * [new branch] gh/zpcore/10/orig -> origin/gh/zpcore/10/orig 2025-09-07T06:39:17.6026610Z * [new branch] gh/zpcore/11/base -> origin/gh/zpcore/11/base 2025-09-07T06:39:17.6026679Z * [new branch] gh/zpcore/11/head -> origin/gh/zpcore/11/head 2025-09-07T06:39:17.6026750Z * [new branch] gh/zpcore/11/orig -> origin/gh/zpcore/11/orig 2025-09-07T06:39:17.6026820Z * [new branch] gh/zpcore/12/base -> origin/gh/zpcore/12/base 2025-09-07T06:39:17.6026890Z * [new branch] gh/zpcore/12/head -> origin/gh/zpcore/12/head 2025-09-07T06:39:17.6026961Z * [new branch] gh/zpcore/12/orig -> origin/gh/zpcore/12/orig 2025-09-07T06:39:17.6028083Z * [new branch] gh/zpcore/13/base -> origin/gh/zpcore/13/base 2025-09-07T06:39:17.6028157Z * [new branch] gh/zpcore/13/head -> origin/gh/zpcore/13/head 2025-09-07T06:39:17.6028226Z * [new branch] gh/zpcore/13/orig -> origin/gh/zpcore/13/orig 2025-09-07T06:39:17.6028296Z * [new branch] gh/zpcore/14/base -> origin/gh/zpcore/14/base 2025-09-07T06:39:17.6028365Z * [new branch] gh/zpcore/14/head -> origin/gh/zpcore/14/head 2025-09-07T06:39:17.6028436Z * [new branch] gh/zpcore/2/base -> origin/gh/zpcore/2/base 2025-09-07T06:39:17.6028505Z * [new branch] gh/zpcore/2/head -> origin/gh/zpcore/2/head 2025-09-07T06:39:17.6028575Z * [new branch] gh/zpcore/3/base -> origin/gh/zpcore/3/base 2025-09-07T06:39:17.6028646Z * [new branch] gh/zpcore/3/head -> origin/gh/zpcore/3/head 2025-09-07T06:39:17.6028714Z * [new branch] gh/zpcore/4/base -> origin/gh/zpcore/4/base 2025-09-07T06:39:17.6028782Z * [new branch] gh/zpcore/4/head -> origin/gh/zpcore/4/head 2025-09-07T06:39:17.6028850Z * [new branch] gh/zpcore/5/base -> origin/gh/zpcore/5/base 2025-09-07T06:39:17.6028917Z * [new branch] gh/zpcore/5/head -> origin/gh/zpcore/5/head 2025-09-07T06:39:17.6028986Z * [new branch] gh/zpcore/6/base -> origin/gh/zpcore/6/base 2025-09-07T06:39:17.6029055Z * [new branch] gh/zpcore/6/head -> origin/gh/zpcore/6/head 2025-09-07T06:39:17.6029123Z * [new branch] gh/zpcore/7/base -> origin/gh/zpcore/7/base 2025-09-07T06:39:17.6029192Z * [new branch] gh/zpcore/7/head -> origin/gh/zpcore/7/head 2025-09-07T06:39:17.6029315Z * [new branch] gh/zpcore/8/base -> origin/gh/zpcore/8/base 2025-09-07T06:39:17.6029384Z * [new branch] gh/zpcore/8/head -> origin/gh/zpcore/8/head 2025-09-07T06:39:17.6029457Z * [new branch] google-main -> origin/google-main 2025-09-07T06:39:17.6029549Z * [new branch] guangyey/external_stream -> origin/guangyey/external_stream 2025-09-07T06:39:17.6029626Z * [new branch] guangyey/host_alloc -> origin/guangyey/host_alloc 2025-09-07T06:39:17.6029700Z * [new branch] guangyey/reimport -> origin/guangyey/reimport 2025-09-07T06:39:17.6029826Z * [new branch] guangyey/test_2025 -> origin/guangyey/test_2025 2025-09-07T06:39:17.6029967Z * [new branch] guilhermeleobas/cherry-pick-55d87d9dfd9 -> origin/guilhermeleobas/cherry-pick-55d87d9dfd9 2025-09-07T06:39:17.6031118Z * [new branch] haozhe/bf16-dynamic-shape -> origin/haozhe/bf16-dynamic-shape 2025-09-07T06:39:17.6031187Z * [new branch] hc_baseline -> origin/hc_baseline 2025-09-07T06:39:17.6031251Z * [new branch] hf_update -> origin/hf_update 2025-09-07T06:39:17.6031321Z * [new branch] hhh_decomp_mul -> origin/hhh_decomp_mul 2025-09-07T06:39:17.6031386Z * [new branch] hhh_rand -> origin/hhh_rand 2025-09-07T06:39:17.6031453Z * [new branch] hoy/mmsplitk -> origin/hoy/mmsplitk 2025-09-07T06:39:17.6031530Z * [new branch] hoy/triton-PR3973 -> origin/hoy/triton-PR3973 2025-09-07T06:39:17.6031642Z * [new branch] hoy/triton-coalescing-baseline -> origin/hoy/triton-coalescing-baseline 2025-09-07T06:39:17.6031735Z * [new branch] hoy/triton-coalescing-new -> origin/hoy/triton-coalescing-new 2025-09-07T06:39:17.6031825Z * [new branch] hoy/triton-coalescing-vec -> origin/hoy/triton-coalescing-vec 2025-09-07T06:39:17.6031902Z * [new branch] inductordecompfix -> origin/inductordecompfix 2025-09-07T06:39:17.6031965Z * [new branch] inline -> origin/inline 2025-09-07T06:39:17.6032028Z * [new branch] inlining -> origin/inlining 2025-09-07T06:39:17.6032103Z * [new branch] inlining-ezyang -> origin/inlining-ezyang 2025-09-07T06:39:17.6032190Z * [new branch] install-torchao-0.13.0 -> origin/install-torchao-0.13.0 2025-09-07T06:39:17.6032256Z * [new branch] int8_sdpa -> origin/int8_sdpa 2025-09-07T06:39:17.6032332Z * [new branch] invoke-subgraph -> origin/invoke-subgraph 2025-09-07T06:39:17.6032403Z * [new branch] issue#58739 -> origin/issue#58739 2025-09-07T06:39:17.6032529Z * [new branch] jcaip/test-cusparselt-version-0.6.2 -> origin/jcaip/test-cusparselt-version-0.6.2 2025-09-07T06:39:17.6032636Z * [new branch] jcaip/update-cusparselt-0.6.2 -> origin/jcaip/update-cusparselt-0.6.2 2025-09-07T06:39:17.6032749Z * [new branch] jeanschmidt/disable_rocm_build_tests -> origin/jeanschmidt/disable_rocm_build_tests 2025-09-07T06:39:17.6032839Z * [new branch] jithunnair-amd-patch-1 -> origin/jithunnair-amd-patch-1 2025-09-07T06:39:17.6032926Z * [new branch] jithunnair-amd-patch-2 -> origin/jithunnair-amd-patch-2 2025-09-07T06:39:17.6033015Z * [new branch] justinchu/attention-tests -> origin/justinchu/attention-tests 2025-09-07T06:39:17.6034139Z * [new branch] justinchu/native-qdq -> origin/justinchu/native-qdq 2025-09-07T06:39:17.6034219Z * [new branch] justinchu/ort-122 -> origin/justinchu/ort-122 2025-09-07T06:39:17.6034305Z * [new branch] justinchuby/dynamo-true -> origin/justinchuby/dynamo-true 2025-09-07T06:39:17.6034418Z * [new branch] kainan666/xlf_debug -> origin/kainan666/xlf_debug 2025-09-07T06:39:17.6034486Z * [new branch] kainan_test -> origin/kainan_test 2025-09-07T06:39:17.6034556Z * [new branch] learnablebias -> origin/learnablebias 2025-09-07T06:39:17.6034661Z * [new branch] leslie/test_group_gemm_epilogues -> origin/leslie/test_group_gemm_epilogues 2025-09-07T06:39:17.6034764Z * [new branch] lessw2020/fix_cutlass_cache_error -> origin/lessw2020/fix_cutlass_cache_error 2025-09-07T06:39:17.6034846Z * [new branch] liaoxuan/shm_all_reduce -> origin/liaoxuan/shm_all_reduce 2025-09-07T06:39:17.6034978Z * [new branch] liaoxuan/test_fa_disable_softmax -> origin/liaoxuan/test_fa_disable_softmax 2025-09-07T06:39:17.6035062Z * [new branch] liaoxuan/test_int8_sdpa -> origin/liaoxuan/test_int8_sdpa 2025-09-07T06:39:17.6035136Z * [new branch] lintbuilddocker -> origin/lintbuilddocker 2025-09-07T06:39:17.6035205Z * [new branch] llama4-stable -> origin/llama4-stable 2025-09-07T06:39:17.6035270Z * [new branch] logdetfix -> origin/logdetfix 2025-09-07T06:39:17.6035340Z * [new branch] lts/release/1.8 -> origin/lts/release/1.8 2025-09-07T06:39:17.6035418Z * [new branch] lucaskabela/#94773 -> origin/lucaskabela/#94773 2025-09-07T06:39:17.6035504Z * [new branch] lucaskabela/flop_counter -> origin/lucaskabela/flop_counter 2025-09-07T06:39:17.6035601Z * [new branch] lucaskabela/func_under_decomp -> origin/lucaskabela/func_under_decomp 2025-09-07T06:39:17.6035707Z * [new branch] lucaskabela/functional_in_dynamo -> origin/lucaskabela/functional_in_dynamo 2025-09-07T06:39:17.6035832Z * [new branch] lucaskabela/install_params_as_graph_attr -> origin/lucaskabela/install_params_as_graph_attr 2025-09-07T06:39:17.6035919Z * [new branch] lucaskabela/issue_120648 -> origin/lucaskabela/issue_120648 2025-09-07T06:39:17.6036017Z * [new branch] lucaskabela/misc_typing_dynamo -> origin/lucaskabela/misc_typing_dynamo 2025-09-07T06:39:17.6036130Z * [new branch] lucaskabela/parameters_as_graph_attr -> origin/lucaskabela/parameters_as_graph_attr 2025-09-07T06:39:17.6036262Z * [new branch] lucaskabela/remove_aot_dispatcher_metadata -> origin/lucaskabela/remove_aot_dispatcher_metadata 2025-09-07T06:39:17.6037499Z * [new branch] lucaskabela/rnn_decomp -> origin/lucaskabela/rnn_decomp 2025-09-07T06:39:17.6037599Z * [new branch] lucaskabela/typing_backends -> origin/lucaskabela/typing_backends 2025-09-07T06:39:17.6037710Z * [new branch] lucaskabela/typing_symbolic_convert -> origin/lucaskabela/typing_symbolic_convert 2025-09-07T06:39:17.6037828Z * [new branch] lucaskabela/typing_utils_improvements -> origin/lucaskabela/typing_utils_improvements 2025-09-07T06:39:17.6037892Z * [new branch] main -> origin/main 2025-09-07T06:39:17.6038081Z * [new branch] main-enable-b200-distributed-tests -> origin/main-enable-b200-distributed-tests 2025-09-07T06:39:17.6038155Z * [new branch] malfet-patch-1 -> origin/malfet-patch-1 2025-09-07T06:39:17.6038228Z * [new branch] malfet-patch-12 -> origin/malfet-patch-12 2025-09-07T06:39:17.6038300Z * [new branch] malfet-patch-14 -> origin/malfet-patch-14 2025-09-07T06:39:17.6038370Z * [new branch] malfet-patch-6 -> origin/malfet-patch-6 2025-09-07T06:39:17.6038440Z * [new branch] malfet-patch-8 -> origin/malfet-patch-8 2025-09-07T06:39:17.6038614Z * [new branch] malfet/be-move-more-settings-to-checkout-pytorch -> origin/malfet/be-move-more-settings-to-checkout-pytorch 2025-09-07T06:39:17.6038776Z * [new branch] malfet/delete-upsteam-cuda -> origin/malfet/delete-upsteam-cuda 2025-09-07T06:39:17.6038874Z * [new branch] malfet/mps-implement-col2im -> origin/malfet/mps-implement-col2im 2025-09-07T06:39:17.6038986Z * [new branch] manuel/test-ops-common-allow-mps -> origin/manuel/test-ops-common-allow-mps 2025-09-07T06:39:17.6039063Z * [new branch] metascroy-patch-1 -> origin/metascroy-patch-1 2025-09-07T06:39:17.6039143Z * [new branch] mlazos/S429861-debug -> origin/mlazos/S429861-debug 2025-09-07T06:39:17.6039208Z * [new branch] mlazos/aa -> origin/mlazos/aa 2025-09-07T06:39:17.6039345Z * [new branch] mlazos/arg-renames -> origin/mlazos/arg-renames 2025-09-07T06:39:17.6039435Z * [new branch] mlazos/backup-test-branch -> origin/mlazos/backup-test-branch 2025-09-07T06:39:17.6039519Z * [new branch] mlazos/bad-cudagraphs -> origin/mlazos/bad-cudagraphs 2025-09-07T06:39:17.6039590Z * [new branch] mlazos/baseline -> origin/mlazos/baseline 2025-09-07T06:39:17.6039690Z * [new branch] mlazos/baseline-graph-breaks -> origin/mlazos/baseline-graph-breaks 2025-09-07T06:39:17.6039768Z * [new branch] mlazos/beta-tensor -> origin/mlazos/beta-tensor 2025-09-07T06:39:17.6040908Z * [new branch] mlazos/better-msg -> origin/mlazos/better-msg 2025-09-07T06:39:17.6040979Z * [new branch] mlazos/buffers -> origin/mlazos/buffers 2025-09-07T06:39:17.6041049Z * [new branch] mlazos/buffers2 -> origin/mlazos/buffers2 2025-09-07T06:39:17.6041120Z * [new branch] mlazos/buffers3 -> origin/mlazos/buffers3 2025-09-07T06:39:17.6041184Z * [new branch] mlazos/ck2 -> origin/mlazos/ck2 2025-09-07T06:39:17.6041266Z * [new branch] mlazos/combokernels -> origin/mlazos/combokernels 2025-09-07T06:39:17.6041342Z * [new branch] mlazos/ctx-cleanup -> origin/mlazos/ctx-cleanup 2025-09-07T06:39:17.6041420Z * [new branch] mlazos/cuda-cmd-log -> origin/mlazos/cuda-cmd-log 2025-09-07T06:39:17.6041505Z * [new branch] mlazos/cudagraph-tests -> origin/mlazos/cudagraph-tests 2025-09-07T06:39:17.6041609Z * [new branch] mlazos/cudagraphs-measurement -> origin/mlazos/cudagraphs-measurement 2025-09-07T06:39:17.6041686Z * [new branch] mlazos/cutlass-test -> origin/mlazos/cutlass-test 2025-09-07T06:39:17.6041774Z * [new branch] mlazos/cutlass-topo-bug -> origin/mlazos/cutlass-topo-bug 2025-09-07T06:39:17.6041851Z * [new branch] mlazos/data-gather -> origin/mlazos/data-gather 2025-09-07T06:39:17.6041930Z * [new branch] mlazos/data-ptrs2 -> origin/mlazos/data-ptrs2 2025-09-07T06:39:17.6042004Z * [new branch] mlazos/data-ptrs3 -> origin/mlazos/data-ptrs3 2025-09-07T06:39:17.6042089Z * [new branch] mlazos/dataclass-proxy -> origin/mlazos/dataclass-proxy 2025-09-07T06:39:17.6042158Z * [new branch] mlazos/dc-attrs -> origin/mlazos/dc-attrs 2025-09-07T06:39:17.6042231Z * [new branch] mlazos/dc-helion -> origin/mlazos/dc-helion 2025-09-07T06:39:17.6042302Z * [new branch] mlazos/dict-fix -> origin/mlazos/dict-fix 2025-09-07T06:39:17.6042388Z * [new branch] mlazos/disable-closures -> origin/mlazos/disable-closures 2025-09-07T06:39:17.6042461Z * [new branch] mlazos/disable-tf -> origin/mlazos/disable-tf 2025-09-07T06:39:17.6042534Z * [new branch] mlazos/dupe-fix -> origin/mlazos/dupe-fix 2025-09-07T06:39:17.6042606Z * [new branch] mlazos/dyn-batch -> origin/mlazos/dyn-batch 2025-09-07T06:39:17.6042719Z * [new branch] mlazos/evt -> origin/mlazos/evt 2025-09-07T06:39:17.6043849Z * [new branch] mlazos/exp_disable -> origin/mlazos/exp_disable 2025-09-07T06:39:17.6043937Z * [new branch] mlazos/extract-examples -> origin/mlazos/extract-examples 2025-09-07T06:39:17.6044011Z * [new branch] mlazos/foreach-op -> origin/mlazos/foreach-op 2025-09-07T06:39:17.6044076Z * [new branch] mlazos/fp8 -> origin/mlazos/fp8 2025-09-07T06:39:17.6044146Z * [new branch] mlazos/fp8-bias -> origin/mlazos/fp8-bias 2025-09-07T06:39:17.6044228Z * [new branch] mlazos/fp8-bias-fusion -> origin/mlazos/fp8-bias-fusion 2025-09-07T06:39:17.6044348Z * [new branch] mlazos/fp8-fixes -> origin/mlazos/fp8-fixes 2025-09-07T06:39:17.6044418Z * [new branch] mlazos/freezing -> origin/mlazos/freezing 2025-09-07T06:39:17.6044488Z * [new branch] mlazos/h-comp -> origin/mlazos/h-comp 2025-09-07T06:39:17.6044557Z * [new branch] mlazos/h-comp2 -> origin/mlazos/h-comp2 2025-09-07T06:39:17.6044628Z * [new branch] mlazos/hash-hop -> origin/mlazos/hash-hop 2025-09-07T06:39:17.6044691Z * [new branch] mlazos/hc -> origin/mlazos/hc 2025-09-07T06:39:17.6044763Z * [new branch] mlazos/hc-cycles -> origin/mlazos/hc-cycles 2025-09-07T06:39:17.6044832Z * [new branch] mlazos/hc-fixes -> origin/mlazos/hc-fixes 2025-09-07T06:39:17.6044901Z * [new branch] mlazos/hc-fixes3 -> origin/mlazos/hc-fixes3 2025-09-07T06:39:17.6044973Z * [new branch] mlazos/hc-fixes4 -> origin/mlazos/hc-fixes4 2025-09-07T06:39:17.6045040Z * [new branch] mlazos/hc-hf -> origin/mlazos/hc-hf 2025-09-07T06:39:17.6045110Z * [new branch] mlazos/hc-mut -> origin/mlazos/hc-mut 2025-09-07T06:39:17.6045174Z * [new branch] mlazos/hc10 -> origin/mlazos/hc10 2025-09-07T06:39:17.6045239Z * [new branch] mlazos/hc11 -> origin/mlazos/hc11 2025-09-07T06:39:17.6045301Z * [new branch] mlazos/hc12 -> origin/mlazos/hc12 2025-09-07T06:39:17.6045364Z * [new branch] mlazos/hc13 -> origin/mlazos/hc13 2025-09-07T06:39:17.6045427Z * [new branch] mlazos/hc14 -> origin/mlazos/hc14 2025-09-07T06:39:17.6045488Z * [new branch] mlazos/hc15 -> origin/mlazos/hc15 2025-09-07T06:39:17.6045552Z * [new branch] mlazos/hc2 -> origin/mlazos/hc2 2025-09-07T06:39:17.6046740Z * [new branch] mlazos/hc4 -> origin/mlazos/hc4 2025-09-07T06:39:17.6046806Z * [new branch] mlazos/hc5 -> origin/mlazos/hc5 2025-09-07T06:39:17.6046869Z * [new branch] mlazos/hc6 -> origin/mlazos/hc6 2025-09-07T06:39:17.6046931Z * [new branch] mlazos/hc7 -> origin/mlazos/hc7 2025-09-07T06:39:17.6046992Z * [new branch] mlazos/hc8 -> origin/mlazos/hc8 2025-09-07T06:39:17.6047052Z * [new branch] mlazos/hc9 -> origin/mlazos/hc9 2025-09-07T06:39:17.6047130Z * [new branch] mlazos/hc_baseline2 -> origin/mlazos/hc_baseline2 2025-09-07T06:39:17.6047215Z * [new branch] mlazos/init-per-param -> origin/mlazos/init-per-param 2025-09-07T06:39:17.6047294Z * [new branch] mlazos/init_per_param -> origin/mlazos/init_per_param 2025-09-07T06:39:17.6047372Z * [new branch] mlazos/less-guards -> origin/mlazos/less-guards 2025-09-07T06:39:17.6047456Z * [new branch] mlazos/lr-composibility -> origin/mlazos/lr-composibility 2025-09-07T06:39:17.6047585Z * [new branch] mlazos/main -> origin/mlazos/main 2025-09-07T06:39:17.6047683Z * [new branch] mlazos/main-test-enablement -> origin/mlazos/main-test-enablement 2025-09-07T06:39:17.6047749Z * [new branch] mlazos/main2 -> origin/mlazos/main2 2025-09-07T06:39:17.6047838Z * [new branch] mlazos/mark-static-update -> origin/mlazos/mark-static-update 2025-09-07T06:39:17.6047901Z * [new branch] mlazos/mcg -> origin/mlazos/mcg 2025-09-07T06:39:17.6047965Z * [new branch] mlazos/mcg2 -> origin/mlazos/mcg2 2025-09-07T06:39:17.6048039Z * [new branch] mlazos/meta-guards -> origin/mlazos/meta-guards 2025-09-07T06:39:17.6048168Z * [new branch] mlazos/mlazos/ck2 -> origin/mlazos/mlazos/ck2 2025-09-07T06:39:17.6048274Z * [new branch] mlazos/mlazos/foreach-map-adam -> origin/mlazos/mlazos/foreach-map-adam 2025-09-07T06:39:17.6048371Z * [new branch] mlazos/mlazos/tf-mode-backup -> origin/mlazos/mlazos/tf-mode-backup 2025-09-07T06:39:17.6048441Z * [new branch] mlazos/mod-fix -> origin/mlazos/mod-fix 2025-09-07T06:39:17.6048512Z * [new branch] mlazos/mode-fix -> origin/mlazos/mode-fix 2025-09-07T06:39:17.6048585Z * [new branch] mlazos/more-tests -> origin/mlazos/more-tests 2025-09-07T06:39:17.6049709Z * [new branch] mlazos/no-cpp -> origin/mlazos/no-cpp 2025-09-07T06:39:17.6049816Z * [new branch] mlazos/no-init-group-handling -> origin/mlazos/no-init-group-handling 2025-09-07T06:39:17.6049887Z * [new branch] mlazos/offsets -> origin/mlazos/offsets 2025-09-07T06:39:17.6049970Z * [new branch] mlazos/opt-bench-exp2 -> origin/mlazos/opt-bench-exp2 2025-09-07T06:39:17.6050041Z * [new branch] mlazos/opt-incr -> origin/mlazos/opt-incr 2025-09-07T06:39:17.6050119Z * [new branch] mlazos/proxy-ctors -> origin/mlazos/proxy-ctors 2025-09-07T06:39:17.6050191Z * [new branch] mlazos/quant-fix -> origin/mlazos/quant-fix 2025-09-07T06:39:17.6050264Z * [new branch] mlazos/resnet-fix -> origin/mlazos/resnet-fix 2025-09-07T06:39:17.6050343Z * [new branch] mlazos/revert-inline -> origin/mlazos/revert-inline 2025-09-07T06:39:17.6050419Z * [new branch] mlazos/rm-buf-names -> origin/mlazos/rm-buf-names 2025-09-07T06:39:17.6050489Z * [new branch] mlazos/rm-code -> origin/mlazos/rm-code 2025-09-07T06:39:17.6050559Z * [new branch] mlazos/rm-spam -> origin/mlazos/rm-spam 2025-09-07T06:39:17.6050622Z * [new branch] mlazos/rtp -> origin/mlazos/rtp 2025-09-07T06:39:17.6050703Z * [new branch] mlazos/static-idx-dbg -> origin/mlazos/static-idx-dbg 2025-09-07T06:39:17.6050792Z * [new branch] mlazos/static-inputs-log -> origin/mlazos/static-inputs-log 2025-09-07T06:39:17.6050873Z * [new branch] mlazos/sub-param-fix -> origin/mlazos/sub-param-fix 2025-09-07T06:39:17.6050942Z * [new branch] mlazos/td-fix2 -> origin/mlazos/td-fix2 2025-09-07T06:39:17.6051024Z * [new branch] mlazos/tensor-hasattr2 -> origin/mlazos/tensor-hasattr2 2025-09-07T06:39:17.6051089Z * [new branch] mlazos/test -> origin/mlazos/test 2025-09-07T06:39:17.6051158Z * [new branch] mlazos/tf-mode -> origin/mlazos/tf-mode 2025-09-07T06:39:17.6051240Z * [new branch] mlazos/tf-mode-backup2 -> origin/mlazos/tf-mode-backup2 2025-09-07T06:39:17.6051319Z * [new branch] mlazos/tf-mode-reland -> origin/mlazos/tf-mode-reland 2025-09-07T06:39:17.6051438Z * [new branch] mlazos/tf-mode-reland2 -> origin/mlazos/tf-mode-reland2 2025-09-07T06:39:17.6051518Z * [new branch] mlazos/tf-mode-reland3 -> origin/mlazos/tf-mode-reland3 2025-09-07T06:39:17.6052634Z * [new branch] mlazos/topo-fix -> origin/mlazos/topo-fix 2025-09-07T06:39:17.6052718Z * [new branch] mlazos/triton-no-epi -> origin/mlazos/triton-no-epi 2025-09-07T06:39:17.6052792Z * [new branch] mlazos/tune-proto -> origin/mlazos/tune-proto 2025-09-07T06:39:17.6052867Z * [new branch] mlazos/tuple-fixes -> origin/mlazos/tuple-fixes 2025-09-07T06:39:17.6052943Z * [new branch] mlazos/tuple-fixes2 -> origin/mlazos/tuple-fixes2 2025-09-07T06:39:17.6053059Z * [new branch] mlazos/tuple-handling -> origin/mlazos/tuple-handling 2025-09-07T06:39:17.6053135Z * [new branch] mlazos/user-streams -> origin/mlazos/user-streams 2025-09-07T06:39:17.6053208Z * [new branch] mlazos/vary-beta -> origin/mlazos/vary-beta 2025-09-07T06:39:17.6053281Z * [new branch] mlazos/vary-beta2 -> origin/mlazos/vary-beta2 2025-09-07T06:39:17.6053356Z * [new branch] mlazos/weird-perf1 -> origin/mlazos/weird-perf1 2025-09-07T06:39:17.6053430Z * [new branch] mm_out_dtype_compile -> origin/mm_out_dtype_compile 2025-09-07T06:39:17.6053504Z * [new branch] modify-setupvllm -> origin/modify-setupvllm 2025-09-07T06:39:17.6053572Z * [new branch] module-shim -> origin/module-shim 2025-09-07T06:39:17.6053653Z * [new branch] move-theme-out-docker -> origin/move-theme-out-docker 2025-09-07T06:39:17.6053724Z * [new branch] msaroufim/be1 -> origin/msaroufim/be1 2025-09-07T06:39:17.6053798Z * [new branch] msaroufim/cn_path -> origin/msaroufim/cn_path 2025-09-07T06:39:17.6053892Z * [new branch] msaroufim/dtensorfusedadam -> origin/msaroufim/dtensorfusedadam 2025-09-07T06:39:17.6053965Z * [new branch] msaroufim/reduce -> origin/msaroufim/reduce 2025-09-07T06:39:17.6054035Z * [new branch] mtia/basic-cmake -> origin/mtia/basic-cmake 2025-09-07T06:39:17.6054099Z * [new branch] muon_dev -> origin/muon_dev 2025-09-07T06:39:17.6054162Z * [new branch] muon_dev_1 -> origin/muon_dev_1 2025-09-07T06:39:17.6054240Z * [new branch] nativert_num_outputs -> origin/nativert_num_outputs 2025-09-07T06:39:17.6054316Z * [new branch] nativert_numoutputs -> origin/nativert_numoutputs 2025-09-07T06:39:17.6054403Z * [new branch] new-modifiy-setupvllm -> origin/new-modifiy-setupvllm 2025-09-07T06:39:17.6055512Z * [new branch] new-setupvllm -> origin/new-setupvllm 2025-09-07T06:39:17.6055585Z * [new branch] new_zeros_dtype -> origin/new_zeros_dtype 2025-09-07T06:39:17.6055655Z * [new branch] newtest-base -> origin/newtest-base 2025-09-07T06:39:17.6055726Z * [new branch] ngimel/cat_perf1 -> origin/ngimel/cat_perf1 2025-09-07T06:39:17.6055798Z * [new branch] ngimel/einsum_fix -> origin/ngimel/einsum_fix 2025-09-07T06:39:17.6055880Z * [new branch] ngimel/error_index_list -> origin/ngimel/error_index_list 2025-09-07T06:39:17.6055954Z * [new branch] ngimel/fabric_check -> origin/ngimel/fabric_check 2025-09-07T06:39:17.6056025Z * [new branch] ngimel/fabric_fix -> origin/ngimel/fabric_fix 2025-09-07T06:39:17.6056116Z * [new branch] ngimel/fix_driver_init_error -> origin/ngimel/fix_driver_init_error 2025-09-07T06:39:17.6056205Z * [new branch] ngimel/fix_nccl_segment_seg -> origin/ngimel/fix_nccl_segment_seg 2025-09-07T06:39:17.6056318Z * [new branch] ngimel/gg_new -> origin/ngimel/gg_new 2025-09-07T06:39:17.6056389Z * [new branch] ngimel/modeguard -> origin/ngimel/modeguard 2025-09-07T06:39:17.6056468Z * [new branch] ngimel/multicast_fix -> origin/ngimel/multicast_fix 2025-09-07T06:39:17.6056640Z * [new branch] ngimel/rocm_handle_type -> origin/ngimel/rocm_handle_type 2025-09-07T06:39:17.6056727Z * [new branch] ngimel/symm_handle_fabric -> origin/ngimel/symm_handle_fabric 2025-09-07T06:39:17.6056808Z * [new branch] ngimel/unbind_multimem -> origin/ngimel/unbind_multimem 2025-09-07T06:39:17.6056872Z * [new branch] nightly -> origin/nightly 2025-09-07T06:39:17.6057004Z * [new branch] nmacchioni-patch-10 -> origin/nmacchioni-patch-10 2025-09-07T06:39:17.6057084Z * [new branch] nmacchioni-patch-7 -> origin/nmacchioni-patch-7 2025-09-07T06:39:17.6057163Z * [new branch] nmacchioni-patch-8 -> origin/nmacchioni-patch-8 2025-09-07T06:39:17.6057240Z * [new branch] nmacchioni-patch-9 -> origin/nmacchioni-patch-9 2025-09-07T06:39:17.6057319Z * [new branch] nullplay/fuse_matmul -> origin/nullplay/fuse_matmul 2025-09-07T06:39:17.6057398Z * [new branch] nullplay_fuse_matmul -> origin/nullplay_fuse_matmul 2025-09-07T06:39:17.6057461Z * [new branch] one-off -> origin/one-off 2025-09-07T06:39:17.6058600Z * [new branch] orig/release/1.10 -> origin/orig/release/1.10 2025-09-07T06:39:17.6058675Z * [new branch] orig/release/1.11 -> origin/orig/release/1.11 2025-09-07T06:39:17.6058748Z * [new branch] orig/release/1.12 -> origin/orig/release/1.12 2025-09-07T06:39:17.6058819Z * [new branch] orig/release/1.13 -> origin/orig/release/1.13 2025-09-07T06:39:17.6058890Z * [new branch] orig/release/1.6 -> origin/orig/release/1.6 2025-09-07T06:39:17.6058960Z * [new branch] orig/release/1.7 -> origin/orig/release/1.7 2025-09-07T06:39:17.6059029Z * [new branch] orig/release/1.8 -> origin/orig/release/1.8 2025-09-07T06:39:17.6059097Z * [new branch] orig/release/1.9 -> origin/orig/release/1.9 2025-09-07T06:39:17.6059165Z * [new branch] orig/release/2.0 -> origin/orig/release/2.0 2025-09-07T06:39:17.6059233Z * [new branch] orig/release/2.1 -> origin/orig/release/2.1 2025-09-07T06:39:17.6059301Z * [new branch] orig/release/2.2 -> origin/orig/release/2.2 2025-09-07T06:39:17.6059369Z * [new branch] orig/release/2.3 -> origin/orig/release/2.3 2025-09-07T06:39:17.6059437Z * [new branch] orig/release/2.4 -> origin/orig/release/2.4 2025-09-07T06:39:17.6059506Z * [new branch] orig/release/2.5 -> origin/orig/release/2.5 2025-09-07T06:39:17.6059574Z * [new branch] orig/release/2.6 -> origin/orig/release/2.6 2025-09-07T06:39:17.6059641Z * [new branch] orig/release/2.7 -> origin/orig/release/2.7 2025-09-07T06:39:17.6059710Z * [new branch] orig/release/2.8 -> origin/orig/release/2.8 2025-09-07T06:39:17.6059779Z * [new branch] oulgen/fx_graph -> origin/oulgen/fx_graph 2025-09-07T06:39:17.6059849Z * [new branch] padded-tensor -> origin/padded-tensor 2025-09-07T06:39:17.6059915Z * [new branch] pca2 -> origin/pca2 2025-09-07T06:39:17.6059991Z * [new branch] pianpwk-patch-1 -> origin/pianpwk-patch-1 2025-09-07T06:39:17.6060109Z * [new branch] pianpwk/backed_size_oblivious_export -> origin/pianpwk/backed_size_oblivious_export 2025-09-07T06:39:17.6060260Z * [new branch] pianpwk/invalidate_fake_memo -> origin/pianpwk/invalidate_fake_memo 2025-09-07T06:39:17.6060341Z * [new branch] pianpwk/max_1_strides -> origin/pianpwk/max_1_strides 2025-09-07T06:39:17.6060424Z * [new branch] pianpwk/maybe_guard_rel -> origin/pianpwk/maybe_guard_rel 2025-09-07T06:39:17.6061560Z * [new branch] pianpwk/nonzero_memo -> origin/pianpwk/nonzero_memo 2025-09-07T06:39:17.6061679Z * [new branch] pianpwk/oblivious_reshape_view_better -> origin/pianpwk/oblivious_reshape_view_better 2025-09-07T06:39:17.6061781Z * [new branch] pianpwk/oblivious_slice_forward -> origin/pianpwk/oblivious_slice_forward 2025-09-07T06:39:17.6061901Z * [new branch] pianpwk/oblivious_where -> origin/pianpwk/oblivious_where 2025-09-07T06:39:17.6061989Z * [new branch] pianpwk/param_static_pgo -> origin/pianpwk/param_static_pgo 2025-09-07T06:39:17.6062076Z * [new branch] pianpwk/pre_forward_hook -> origin/pianpwk/pre_forward_hook 2025-09-07T06:39:17.6062177Z * [new branch] pianpwk/remove_guard_fail_break -> origin/pianpwk/remove_guard_fail_break 2025-09-07T06:39:17.6062266Z * [new branch] pianpwk/slice_fresh_symbols -> origin/pianpwk/slice_fresh_symbols 2025-09-07T06:39:17.6062348Z * [new branch] pianpwk/sym_tokens_draft -> origin/pianpwk/sym_tokens_draft 2025-09-07T06:39:17.6062463Z * [new branch] pianpwk/test_pointwise_guard_or_false -> origin/pianpwk/test_pointwise_guard_or_false 2025-09-07T06:39:17.6062555Z * [new branch] pianpwk/test_slice_fake_impl -> origin/pianpwk/test_slice_fake_impl 2025-09-07T06:39:17.6062656Z * [new branch] pianpwk/totally_draft_sym_wrap -> origin/pianpwk/totally_draft_sym_wrap 2025-09-07T06:39:17.6062756Z * [new branch] pianpwk/unbacked_channels_last -> origin/pianpwk/unbacked_channels_last 2025-09-07T06:39:17.6062848Z * [new branch] pianpwk/unbacked_safe_conv1d -> origin/pianpwk/unbacked_safe_conv1d 2025-09-07T06:39:17.6062939Z * [new branch] pianpwk/unbacked_sdpa_flash -> origin/pianpwk/unbacked_sdpa_flash 2025-09-07T06:39:17.6063032Z * [new branch] pianpwk/unbacked_should_swap -> origin/pianpwk/unbacked_should_swap 2025-09-07T06:39:17.6063128Z * [new branch] pianpwk/unbacked_should_swap_2 -> origin/pianpwk/unbacked_should_swap_2 2025-09-07T06:39:17.6063224Z * [new branch] pianpwk/unbacked_slice_binding -> origin/pianpwk/unbacked_slice_binding 2025-09-07T06:39:17.6063321Z * [new branch] pianpwk/unbacked_slice_forward -> origin/pianpwk/unbacked_slice_forward 2025-09-07T06:39:17.6063400Z * [new branch] pianpwk/user_symints -> origin/pianpwk/user_symints 2025-09-07T06:39:17.6063481Z * [new branch] pianpwk/wan21_reshape -> origin/pianpwk/wan21_reshape 2025-09-07T06:39:17.6063575Z * [new branch] pianpwk/whitelist_optimizer -> origin/pianpwk/whitelist_optimizer 2025-09-07T06:39:17.6063644Z * [new branch] pin-torchao -> origin/pin-torchao 2025-09-07T06:39:17.6063729Z * [new branch] piz/fall_back_missing_0716 -> origin/piz/fall_back_missing_0716 2025-09-07T06:39:17.6064866Z * [new branch] piz/improve_scatter_0808 -> origin/piz/improve_scatter_0808 2025-09-07T06:39:17.6064939Z * [new branch] pool-separate -> origin/pool-separate 2025-09-07T06:39:17.6065004Z * [new branch] pr-156087 -> origin/pr-156087 2025-09-07T06:39:17.6065068Z * [new branch] pr/131860 -> origin/pr/131860 2025-09-07T06:39:17.6065142Z * [new branch] predispatch_to -> origin/predispatch_to 2025-09-07T06:39:17.6065211Z * [new branch] pt-opt-cuda3 -> origin/pt-opt-cuda3 2025-09-07T06:39:17.6065278Z * [new branch] pyobjectslot -> origin/pyobjectslot 2025-09-07T06:39:17.6065405Z * [new branch] python_compiled_autograd -> origin/python_compiled_autograd 2025-09-07T06:39:17.6065490Z * [new branch] qchip/export-D54134695 -> origin/qchip/export-D54134695 2025-09-07T06:39:17.6065555Z * [new branch] quint-bits -> origin/quint-bits 2025-09-07T06:39:17.6065622Z * [new branch] release/1.10 -> origin/release/1.10 2025-09-07T06:39:17.6065687Z * [new branch] release/1.11 -> origin/release/1.11 2025-09-07T06:39:17.6065750Z * [new branch] release/1.12 -> origin/release/1.12 2025-09-07T06:39:17.6065843Z * [new branch] release/1.13 -> origin/release/1.13 2025-09-07T06:39:17.6065907Z * [new branch] release/1.4 -> origin/release/1.4 2025-09-07T06:39:17.6065973Z * [new branch] release/1.4.1 -> origin/release/1.4.1 2025-09-07T06:39:17.6066038Z * [new branch] release/1.5 -> origin/release/1.5 2025-09-07T06:39:17.6066101Z * [new branch] release/1.6 -> origin/release/1.6 2025-09-07T06:39:17.6066163Z * [new branch] release/1.7 -> origin/release/1.7 2025-09-07T06:39:17.6066225Z * [new branch] release/1.8 -> origin/release/1.8 2025-09-07T06:39:17.6066287Z * [new branch] release/1.9 -> origin/release/1.9 2025-09-07T06:39:17.6066349Z * [new branch] release/2.0 -> origin/release/2.0 2025-09-07T06:39:17.6066412Z * [new branch] release/2.1 -> origin/release/2.1 2025-09-07T06:39:17.6066475Z * [new branch] release/2.2 -> origin/release/2.2 2025-09-07T06:39:17.6067687Z * [new branch] release/2.3 -> origin/release/2.3 2025-09-07T06:39:17.6067755Z * [new branch] release/2.4 -> origin/release/2.4 2025-09-07T06:39:17.6067817Z * [new branch] release/2.5 -> origin/release/2.5 2025-09-07T06:39:17.6067878Z * [new branch] release/2.6 -> origin/release/2.6 2025-09-07T06:39:17.6067941Z * [new branch] release/2.7 -> origin/release/2.7 2025-09-07T06:39:17.6068003Z * [new branch] release/2.8 -> origin/release/2.8 2025-09-07T06:39:17.6068068Z * [new branch] release_notes -> origin/release_notes 2025-09-07T06:39:17.6068158Z * [new branch] remove-actionable-label -> origin/remove-actionable-label 2025-09-07T06:39:17.6068224Z * [new branch] remove-ao -> origin/remove-ao 2025-09-07T06:39:17.6068317Z * [new branch] removedeprecatedvllmtest -> origin/removedeprecatedvllmtest 2025-09-07T06:39:17.6068446Z * [new branch] replace-pytorch-labs-20250812-195836 -> origin/replace-pytorch-labs-20250812-195836 2025-09-07T06:39:17.6068570Z * [new branch] replace-pytorch-labs-20250812-200248 -> origin/replace-pytorch-labs-20250812-200248 2025-09-07T06:39:17.6068689Z * [new branch] replace-pytorch-labs-20250812-200324 -> origin/replace-pytorch-labs-20250812-200324 2025-09-07T06:39:17.6068807Z * [new branch] replace-pytorch-labs-20250812-204020 -> origin/replace-pytorch-labs-20250812-204020 2025-09-07T06:39:17.6068924Z * [new branch] replace-pytorch-labs-20250812-204125 -> origin/replace-pytorch-labs-20250812-204125 2025-09-07T06:39:17.6069041Z * [new branch] replace-pytorch-labs-20250812-205624 -> origin/replace-pytorch-labs-20250812-205624 2025-09-07T06:39:17.6069176Z * [new branch] revert-131069-gh/krzysztofjordan/1/head -> origin/revert-131069-gh/krzysztofjordan/1/head 2025-09-07T06:39:17.6069350Z * [new branch] revert-131469-gh/andrewor14/51/head -> origin/revert-131469-gh/andrewor14/51/head 2025-09-07T06:39:17.6069455Z * [new branch] revert-156870-gh/skarjala/3/head -> origin/revert-156870-gh/skarjala/3/head 2025-09-07T06:39:17.6069629Z * [new branch] revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ -> origin/revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ 2025-09-07T06:39:17.6069706Z * [new branch] rocm-monitoring -> origin/rocm-monitoring 2025-09-07T06:39:17.6069783Z * [new branch] ruisi/relax_memory -> origin/ruisi/relax_memory 2025-09-07T06:39:17.6069892Z * [new branch] run-torchbench-smoke-test-h100 -> origin/run-torchbench-smoke-test-h100 2025-09-07T06:39:17.6070097Z * [new branch] ryanguo99/cleanup-dynamo-expected-failures -> origin/ryanguo99/cleanup-dynamo-expected-failures 2025-09-07T06:39:17.6071245Z * [new branch] ryanguo99/fix-closure-var -> origin/ryanguo99/fix-closure-var 2025-09-07T06:39:17.6071330Z * [new branch] rzou/faketensor_bench -> origin/rzou/faketensor_bench 2025-09-07T06:39:17.6071396Z * [new branch] rzou/njt -> origin/rzou/njt 2025-09-07T06:39:17.6071460Z * [new branch] rzou/pca -> origin/rzou/pca 2025-09-07T06:39:17.6071529Z * [new branch] rzou/realprop -> origin/rzou/realprop 2025-09-07T06:39:17.6071606Z * [new branch] rzou/setup_context -> origin/rzou/setup_context 2025-09-07T06:39:17.6071730Z * [new branch] sanchitintel/refactor_aten_int8_woq_gemm -> origin/sanchitintel/refactor_aten_int8_woq_gemm 2025-09-07T06:39:17.6071897Z * [new branch] sanchitintel/weird_thing_with_test_cpu_select_algorithm -> origin/sanchitintel/weird_thing_with_test_cpu_select_algorithm 2025-09-07T06:39:17.6071996Z * [new branch] sapling-pr-archive-SS-JIA -> origin/sapling-pr-archive-SS-JIA 2025-09-07T06:39:17.6072059Z * [new branch] save -> origin/save 2025-09-07T06:39:17.6072123Z * [new branch] sdym/2.5.1 -> origin/sdym/2.5.1 2025-09-07T06:39:17.6072204Z * [new branch] seemethere-patch-1 -> origin/seemethere-patch-1 2025-09-07T06:39:17.6072268Z * [new branch] setupvllm -> origin/setupvllm 2025-09-07T06:39:17.6072339Z * [new branch] share_and_pin_fork -> origin/share_and_pin_fork 2025-09-07T06:39:17.6072420Z * [new branch] shengf/fx-xform-perf -> origin/shengf/fx-xform-perf 2025-09-07T06:39:17.6072501Z * [new branch] shikaili_fp8_allgather -> origin/shikaili_fp8_allgather 2025-09-07T06:39:17.6072580Z * [new branch] shoumikhin-patch-1 -> origin/shoumikhin-patch-1 2025-09-07T06:39:17.6072665Z * [new branch] shoumikhin-patch-12 -> origin/shoumikhin-patch-12 2025-09-07T06:39:17.6072756Z * [new branch] simplify-fq-per-channel -> origin/simplify-fq-per-channel 2025-09-07T06:39:17.6072834Z * [new branch] solve-accuracy-fix -> origin/solve-accuracy-fix 2025-09-07T06:39:17.6072918Z * [new branch] soulitzer/stash-tls-ac -> origin/soulitzer/stash-tls-ac 2025-09-07T06:39:17.6072990Z * [new branch] sqzhang/flight4 -> origin/sqzhang/flight4 2025-09-07T06:39:17.6073069Z * [new branch] sqzhang/flight4plus -> origin/sqzhang/flight4plus 2025-09-07T06:39:17.6073158Z * [new branch] sraikund/record_funct_test -> origin/sraikund/record_funct_test 2025-09-07T06:39:17.6074274Z * [new branch] sraikund16/test -> origin/sraikund16/test 2025-09-07T06:39:17.6074373Z * [new branch] stablize-compilation-time -> origin/stablize-compilation-time 2025-09-07T06:39:17.6074457Z * [new branch] standalone-templates -> origin/standalone-templates 2025-09-07T06:39:17.6074583Z * [new branch] standalone_package_weights -> origin/standalone_package_weights 2025-09-07T06:39:17.6074660Z * [new branch] starterTaskUpdate -> origin/starterTaskUpdate 2025-09-07T06:39:17.6074727Z * [new branch] subgraph_fuse -> origin/subgraph_fuse 2025-09-07T06:39:17.6074819Z * [new branch] support-uv-in-collect_env -> origin/support-uv-in-collect_env 2025-09-07T06:39:17.6074884Z * [new branch] sve-poc -> origin/sve-poc 2025-09-07T06:39:17.6074956Z * [new branch] svekars-patch-1 -> origin/svekars-patch-1 2025-09-07T06:39:17.6075021Z * [new branch] switch-bn -> origin/switch-bn 2025-09-07T06:39:17.6075139Z * [new branch] sympy-bottleneck-repro -> origin/sympy-bottleneck-repro 2025-09-07T06:39:17.6075224Z * [new branch] tenpercent/ck_rocm_ci_v3 -> origin/tenpercent/ck_rocm_ci_v3 2025-09-07T06:39:17.6075307Z * [new branch] tensordict_integration -> origin/tensordict_integration 2025-09-07T06:39:17.6075370Z * [new branch] test-7054 -> origin/test-7054 2025-09-07T06:39:17.6075453Z * [new branch] test-move-conda-builds -> origin/test-move-conda-builds 2025-09-07T06:39:17.6075556Z * [new branch] test-myst-markdown-docstring -> origin/test-myst-markdown-docstring 2025-09-07T06:39:17.6075618Z * [new branch] test-old -> origin/test-old 2025-09-07T06:39:17.6075723Z * [new branch] test-vec-migration-internally -> origin/test-vec-migration-internally 2025-09-07T06:39:17.6075791Z * [new branch] test/bmm_heur -> origin/test/bmm_heur 2025-09-07T06:39:17.6075859Z * [new branch] test/inductor -> origin/test/inductor 2025-09-07T06:39:17.6075950Z * [new branch] tianren/flex_paged_attn_fix -> origin/tianren/flex_paged_attn_fix 2025-09-07T06:39:17.6076054Z * [new branch] tianren/flex_paged_attn_fix_temp -> origin/tianren/flex_paged_attn_fix_temp 2025-09-07T06:39:17.6076121Z * [new branch] tianren/test -> origin/tianren/test 2025-09-07T06:39:17.6076198Z * [new branch] tidy_performance_cyy -> origin/tidy_performance_cyy 2025-09-07T06:39:17.6077402Z * [new branch] torchtitan_ep -> origin/torchtitan_ep 2025-09-07T06:39:17.6077492Z * [new branch] trace_fsdp_torchtune_lora -> origin/trace_fsdp_torchtune_lora 2025-09-07T06:39:17.6077577Z * [new branch] traceable_fsdp_unit_tests -> origin/traceable_fsdp_unit_tests 2025-09-07T06:39:17.6077650Z * [new branch] tree_loop_vec_base -> origin/tree_loop_vec_base 2025-09-07T06:39:17.6077713Z * [new branch] tree_vec_base -> origin/tree_vec_base 2025-09-07T06:39:17.6077781Z * [new branch] triton-update -> origin/triton-update 2025-09-07T06:39:17.6077849Z * [new branch] triton_kernel -> origin/triton_kernel 2025-09-07T06:39:17.6077922Z * [new branch] triton_kernel_perf -> origin/triton_kernel_perf 2025-09-07T06:39:17.6078078Z * [new branch] tt_pkg_1908 -> origin/tt_pkg_1908 2025-09-07T06:39:17.6078182Z * [new branch] tweak-transformer-dependabot -> origin/tweak-transformer-dependabot 2025-09-07T06:39:17.6078247Z * [new branch] type_dec -> origin/type_dec 2025-09-07T06:39:17.6078341Z * [new branch] udate-sphinx-dependancies -> origin/udate-sphinx-dependancies 2025-09-07T06:39:17.6078483Z * [new branch] update-audio-commit-hash/16818882925-1712-1 -> origin/update-audio-commit-hash/16818882925-1712-1 2025-09-07T06:39:17.6078621Z * [new branch] update-audio-commit-hash/16895560422-1720-1 -> origin/update-audio-commit-hash/16895560422-1720-1 2025-09-07T06:39:17.6078808Z * [new branch] update-audio-commit-hash/16924174496-1738-1 -> origin/update-audio-commit-hash/16924174496-1738-1 2025-09-07T06:39:17.6078938Z * [new branch] update-audio-commit-hash/17002010821-1749-1 -> origin/update-audio-commit-hash/17002010821-1749-1 2025-09-07T06:39:17.6079068Z * [new branch] update-audio-commit-hash/17056004427-1766-1 -> origin/update-audio-commit-hash/17056004427-1766-1 2025-09-07T06:39:17.6079196Z * [new branch] update-audio-commit-hash/17085054029-1767-1 -> origin/update-audio-commit-hash/17085054029-1767-1 2025-09-07T06:39:17.6079325Z * [new branch] update-audio-commit-hash/17142507405-1771-1 -> origin/update-audio-commit-hash/17142507405-1771-1 2025-09-07T06:39:17.6079506Z * [new branch] update-audio-commit-hash/17168762740-1773-1 -> origin/update-audio-commit-hash/17168762740-1773-1 2025-09-07T06:39:17.6079638Z * [new branch] update-audio-commit-hash/17311174639-1780-1 -> origin/update-audio-commit-hash/17311174639-1780-1 2025-09-07T06:39:17.6079768Z * [new branch] update-audio-commit-hash/17336898740-1781-1 -> origin/update-audio-commit-hash/17336898740-1781-1 2025-09-07T06:39:17.6079898Z * [new branch] update-audio-commit-hash/17389727684-1786-1 -> origin/update-audio-commit-hash/17389727684-1786-1 2025-09-07T06:39:17.6080028Z * [new branch] update-audio-commit-hash/17449538142-1790-1 -> origin/update-audio-commit-hash/17449538142-1790-1 2025-09-07T06:39:17.6081224Z * [new branch] update-audio-commit-hash/17507351808-1794-1 -> origin/update-audio-commit-hash/17507351808-1794-1 2025-09-07T06:39:17.6081327Z * [new branch] update-dynamic-shapes-doc -> origin/update-dynamic-shapes-doc 2025-09-07T06:39:17.6081479Z * [new branch] update-executorch-commit-hash/15694981040-1626-1 -> origin/update-executorch-commit-hash/15694981040-1626-1 2025-09-07T06:39:17.6081619Z * [new branch] update-triton-commit-hash/13663274526-1487-2 -> origin/update-triton-commit-hash/13663274526-1487-2 2025-09-07T06:39:17.6081754Z * [new branch] update-vision-commit-hash/15336342773-1607-1 -> origin/update-vision-commit-hash/15336342773-1607-1 2025-09-07T06:39:17.6081882Z * [new branch] update-vllm-commit-hash/16737365217-1704-1 -> origin/update-vllm-commit-hash/16737365217-1704-1 2025-09-07T06:39:17.6082009Z * [new branch] update-vllm-commit-hash/16843157111-1713-1 -> origin/update-vllm-commit-hash/16843157111-1713-1 2025-09-07T06:39:17.6082135Z * [new branch] update-vllm-commit-hash/16855312394-1714-1 -> origin/update-vllm-commit-hash/16855312394-1714-1 2025-09-07T06:39:17.6082259Z * [new branch] update-vllm-commit-hash/16924174496-1738-1 -> origin/update-vllm-commit-hash/16924174496-1738-1 2025-09-07T06:39:17.6082384Z * [new branch] update-vllm-commit-hash/16952608705-1745-1 -> origin/update-vllm-commit-hash/16952608705-1745-1 2025-09-07T06:39:17.6082510Z * [new branch] update-vllm-commit-hash/16979836546-1748-1 -> origin/update-vllm-commit-hash/16979836546-1748-1 2025-09-07T06:39:17.6082636Z * [new branch] update-vllm-commit-hash/17014576881-1756-1 -> origin/update-vllm-commit-hash/17014576881-1756-1 2025-09-07T06:39:17.6082762Z * [new branch] update-vllm-commit-hash/17027830869-1761-1 -> origin/update-vllm-commit-hash/17027830869-1761-1 2025-09-07T06:39:17.6082887Z * [new branch] update-vllm-commit-hash/17056004427-1766-1 -> origin/update-vllm-commit-hash/17056004427-1766-1 2025-09-07T06:39:17.6083012Z * [new branch] update-vllm-commit-hash/17085054029-1767-1 -> origin/update-vllm-commit-hash/17085054029-1767-1 2025-09-07T06:39:17.6083142Z * [new branch] update-vllm-commit-hash/17113610216-1768-1 -> origin/update-vllm-commit-hash/17113610216-1768-1 2025-09-07T06:39:17.6083311Z * [new branch] update-vllm-commit-hash/17142507405-1771-1 -> origin/update-vllm-commit-hash/17142507405-1771-1 2025-09-07T06:39:17.6083436Z * [new branch] update-vllm-commit-hash/17181878974-1774-1 -> origin/update-vllm-commit-hash/17181878974-1774-1 2025-09-07T06:39:17.6083561Z * [new branch] update-vllm-commit-hash/17311174639-1780-1 -> origin/update-vllm-commit-hash/17311174639-1780-1 2025-09-07T06:39:17.6083686Z * [new branch] update-vllm-commit-hash/17336898740-1781-1 -> origin/update-vllm-commit-hash/17336898740-1781-1 2025-09-07T06:39:17.6083811Z * [new branch] update-vllm-commit-hash/17364352302-1785-1 -> origin/update-vllm-commit-hash/17364352302-1785-1 2025-09-07T06:39:17.6083978Z * [new branch] update-vllm-commit-hash/17389727684-1786-1 -> origin/update-vllm-commit-hash/17389727684-1786-1 2025-09-07T06:39:17.6084102Z * [new branch] update-vllm-commit-hash/17449538142-1790-1 -> origin/update-vllm-commit-hash/17449538142-1790-1 2025-09-07T06:39:17.6084229Z * [new branch] update-vllm-commit-hash/17480069797-1791-1 -> origin/update-vllm-commit-hash/17480069797-1791-1 2025-09-07T06:39:17.6085402Z * [new branch] update-vllm-commit-hash/17507351808-1794-1 -> origin/update-vllm-commit-hash/17507351808-1794-1 2025-09-07T06:39:17.6085530Z * [new branch] update-xla-commit-hash/16873912760-198-1 -> origin/update-xla-commit-hash/16873912760-198-1 2025-09-07T06:39:17.6085653Z * [new branch] update-xla-commit-hash/17034266655-199-1 -> origin/update-xla-commit-hash/17034266655-199-1 2025-09-07T06:39:17.6085775Z * [new branch] update-xla-commit-hash/17202464405-200-1 -> origin/update-xla-commit-hash/17202464405-200-1 2025-09-07T06:39:17.6085903Z * [new branch] update_docs_torch_multinomial_issue#125388 -> origin/update_docs_torch_multinomial_issue#125388 2025-09-07T06:39:17.6085985Z * [new branch] update_executorch_pin -> origin/update_executorch_pin 2025-09-07T06:39:17.6086076Z * [new branch] update_slow_tests_1722488736 -> origin/update_slow_tests_1722488736 2025-09-07T06:39:17.6086162Z * [new branch] update_slow_tests_1722879173 -> origin/update_slow_tests_1722879173 2025-09-07T06:39:17.6086246Z * [new branch] update_slow_tests_1752478971 -> origin/update_slow_tests_1752478971 2025-09-07T06:39:17.6086329Z * [new branch] update_slow_tests_1755502951 -> origin/update_slow_tests_1755502951 2025-09-07T06:39:17.6086412Z * [new branch] update_slow_tests_1756107664 -> origin/update_slow_tests_1756107664 2025-09-07T06:39:17.6086585Z * [new branch] update_submodule_FBGEMM -> origin/update_submodule_FBGEMM 2025-09-07T06:39:17.6086671Z * [new branch] update_submodule_kineto -> origin/update_submodule_kineto 2025-09-07T06:39:17.6086762Z * [new branch] update_submodule_tensorpipe -> origin/update_submodule_tensorpipe 2025-09-07T06:39:17.6086827Z * [new branch] v0.1.2 -> origin/v0.1.2 2025-09-07T06:39:17.6086895Z * [new branch] v1.0.1 -> origin/v1.0.1 2025-09-07T06:39:17.6086954Z * [new branch] v1.0.3 -> origin/v1.0.3 2025-09-07T06:39:17.6087012Z * [new branch] v1.1.0 -> origin/v1.1.0 2025-09-07T06:39:17.6087071Z * [new branch] v1.2.0 -> origin/v1.2.0 2025-09-07T06:39:17.6087128Z * [new branch] v1.3.0 -> origin/v1.3.0 2025-09-07T06:39:17.6087185Z * [new branch] v1.3.1 -> origin/v1.3.1 2025-09-07T06:39:17.6087251Z * [new branch] validate_fn -> origin/validate_fn 2025-09-07T06:39:17.6087326Z * [new branch] validations_2.6 -> origin/validations_2.6 2025-09-07T06:39:17.6087396Z * [new branch] validations_2.8 -> origin/validations_2.8 2025-09-07T06:39:17.6088584Z * [new branch] viable/strict -> origin/viable/strict 2025-09-07T06:39:17.6088655Z * [new branch] vllmbuildci -> origin/vllmbuildci 2025-09-07T06:39:17.6088719Z * [new branch] vllmpin -> origin/vllmpin 2025-09-07T06:39:17.6088804Z * [new branch] wdvr/conda_devcontainer -> origin/wdvr/conda_devcontainer 2025-09-07T06:39:17.6088872Z * [new branch] wdvr/iss_145259 -> origin/wdvr/iss_145259 2025-09-07T06:39:17.6088949Z * [new branch] weight_sharing_cpp -> origin/weight_sharing_cpp 2025-09-07T06:39:17.6089014Z * [new branch] whc/flight4 -> origin/whc/flight4 2025-09-07T06:39:17.6089146Z * [new branch] whc/flight51 -> origin/whc/flight51 2025-09-07T06:39:17.6089212Z * [new branch] whc/flight53 -> origin/whc/flight53 2025-09-07T06:39:17.6089277Z * [new branch] whc/stage2 -> origin/whc/stage2 2025-09-07T06:39:17.6089341Z * [new branch] whc/uneven -> origin/whc/uneven 2025-09-07T06:39:17.6089414Z * [new branch] whc/uneven-merge -> origin/whc/uneven-merge 2025-09-07T06:39:17.6089478Z * [new branch] win_warnings -> origin/win_warnings 2025-09-07T06:39:17.6089558Z * [new branch] windows_libtorch_free -> origin/windows_libtorch_free 2025-09-07T06:39:17.6089630Z * [new branch] workonoldcommit -> origin/workonoldcommit 2025-09-07T06:39:17.6089785Z * [new branch] wychi-autotune-prune-configs-by-shared-mem -> origin/wychi-autotune-prune-configs-by-shared-mem 2025-09-07T06:39:17.6089854Z * [new branch] xmfan/ca_0516 -> origin/xmfan/ca_0516 2025-09-07T06:39:17.6089927Z * [new branch] xmfan/ca_1051b93192 -> origin/xmfan/ca_1051b93192 2025-09-07T06:39:17.6090079Z * [new branch] xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 -> origin/xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 2025-09-07T06:39:17.6090155Z * [new branch] xmfan/ca_5a2be192d1 -> origin/xmfan/ca_5a2be192d1 2025-09-07T06:39:17.6090228Z * [new branch] xmfan/ca_9d59b516e9 -> origin/xmfan/ca_9d59b516e9 2025-09-07T06:39:17.6090293Z * [new branch] xmfan/ca_api -> origin/xmfan/ca_api 2025-09-07T06:39:17.6090359Z * [new branch] xmfan/ca_apr8 -> origin/xmfan/ca_apr8 2025-09-07T06:39:17.6090425Z * [new branch] xmfan/ca_base -> origin/xmfan/ca_base 2025-09-07T06:39:17.6090501Z * [new branch] xmfan/ca_cudagraphs -> origin/xmfan/ca_cudagraphs 2025-09-07T06:39:17.6091642Z * [new branch] xmfan/ca_dynamic -> origin/xmfan/ca_dynamic 2025-09-07T06:39:17.6091716Z * [new branch] xmfan/ca_fix_dyn -> origin/xmfan/ca_fix_dyn 2025-09-07T06:39:17.6091794Z * [new branch] xmfan/ca_fix_lowering -> origin/xmfan/ca_fix_lowering 2025-09-07T06:39:17.6091873Z * [new branch] xmfan/ca_fix_polyfills -> origin/xmfan/ca_fix_polyfills 2025-09-07T06:39:17.6091938Z * [new branch] xmfan/ca_jan3 -> origin/xmfan/ca_jan3 2025-09-07T06:39:17.6092004Z * [new branch] xmfan/ca_jun18 -> origin/xmfan/ca_jun18 2025-09-07T06:39:17.6092071Z * [new branch] xmfan/ca_jun24 -> origin/xmfan/ca_jun24 2025-09-07T06:39:17.6092141Z * [new branch] xmfan/ca_mem_base -> origin/xmfan/ca_mem_base 2025-09-07T06:39:17.6092211Z * [new branch] xmfan/ca_mem_fix -> origin/xmfan/ca_mem_fix 2025-09-07T06:39:17.6092285Z * [new branch] xmfan/ca_memory_fix -> origin/xmfan/ca_memory_fix 2025-09-07T06:39:17.6092372Z * [new branch] xmfan/ca_memory_fix_rebased -> origin/xmfan/ca_memory_fix_rebased 2025-09-07T06:39:17.6092502Z * [new branch] xmfan/ca_memory_fix_rebased2 -> origin/xmfan/ca_memory_fix_rebased2 2025-09-07T06:39:17.6092578Z * [new branch] xmfan/ca_move_to_cuda -> origin/xmfan/ca_move_to_cuda 2025-09-07T06:39:17.6092647Z * [new branch] xmfan/ca_nested -> origin/xmfan/ca_nested 2025-09-07T06:39:17.6092717Z * [new branch] xmfan/ca_overhead -> origin/xmfan/ca_overhead 2025-09-07T06:39:17.6092807Z * [new branch] xmfan/ca_overhead_0eba7e5451 -> origin/xmfan/ca_overhead_0eba7e5451 2025-09-07T06:39:17.6092875Z * [new branch] xmfan/ca_scalar -> origin/xmfan/ca_scalar 2025-09-07T06:39:17.6092988Z * [new branch] xmfan/ca_subclass_mem_fix -> origin/xmfan/ca_subclass_mem_fix 2025-09-07T06:39:17.6093059Z * [new branch] xmfan/ca_warm_mem -> origin/xmfan/ca_warm_mem 2025-09-07T06:39:17.6093137Z * [new branch] xmfan/ca_warm_mem_base -> origin/xmfan/ca_warm_mem_base 2025-09-07T06:39:17.6093207Z * [new branch] xmfan/cacu_jun18 -> origin/xmfan/cacu_jun18 2025-09-07T06:39:17.6093275Z * [new branch] xmfan/cacu_jun19 -> origin/xmfan/cacu_jun19 2025-09-07T06:39:17.6093344Z * [new branch] xmfan/cacu_jun4 -> origin/xmfan/cacu_jun4 2025-09-07T06:39:17.6093411Z * [new branch] xmfan/cacu_may27 -> origin/xmfan/cacu_may27 2025-09-07T06:39:17.6094547Z * [new branch] xmfan/disable_duck_shape -> origin/xmfan/disable_duck_shape 2025-09-07T06:39:17.6094650Z * [new branch] xmfan/fca_cpp_node_passthrough -> origin/xmfan/fca_cpp_node_passthrough 2025-09-07T06:39:17.6094727Z * [new branch] xmfan/issue_123374 -> origin/xmfan/issue_123374 2025-09-07T06:39:17.6094879Z * [new branch] xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 2025-09-07T06:39:17.6095028Z * [new branch] xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 2025-09-07T06:39:17.6095105Z * [new branch] xmfan/segfault_test -> origin/xmfan/segfault_test 2025-09-07T06:39:17.6095176Z * [new branch] xmfan/single_step -> origin/xmfan/single_step 2025-09-07T06:39:17.6095245Z * [new branch] xmfan/sth_0829 -> origin/xmfan/sth_0829 2025-09-07T06:39:17.6095309Z * [new branch] xmfan/test -> origin/xmfan/test 2025-09-07T06:39:17.6095398Z * [new branch] yguo/debug-0226-constexpr -> origin/yguo/debug-0226-constexpr 2025-09-07T06:39:17.6095480Z * [new branch] yguo/new_latest_changes -> origin/yguo/new_latest_changes 2025-09-07T06:39:17.6095573Z * [new branch] yguo/patch_constexpr_changes -> origin/yguo/patch_constexpr_changes 2025-09-07T06:39:17.6095649Z * [new branch] yihan_quantization -> origin/yihan_quantization 2025-09-07T06:39:17.6095746Z * [new branch] yiming/add_jit_trace_benchmark -> origin/yiming/add_jit_trace_benchmark 2025-09-07T06:39:17.6095841Z * [new branch] yiming/add_nativert_benchmark -> origin/yiming/add_nativert_benchmark 2025-09-07T06:39:17.6095912Z * [new branch] yiming/bootcamp -> origin/yiming/bootcamp 2025-09-07T06:39:17.6095987Z * [new branch] zainr/canary-test -> origin/zainr/canary-test 2025-09-07T06:39:17.6096081Z * [new branch] zainr/cleanup-gh-runners -> origin/zainr/cleanup-gh-runners 2025-09-07T06:39:17.6096158Z * [new branch] zainr/git-push-v2 -> origin/zainr/git-push-v2 2025-09-07T06:39:17.6096243Z * [new branch] zainr/pull-migration-c -> origin/zainr/pull-migration-c 2025-09-07T06:39:17.6096309Z * [new branch] zainr/test -> origin/zainr/test 2025-09-07T06:39:17.6096414Z * [new branch] zainr/test2 -> origin/zainr/test2 2025-09-07T06:39:17.6096552Z * [new branch] zainr/unstable -> origin/zainr/unstable 2025-09-07T06:39:17.6096629Z * [new branch] zainr/unstable-xla -> origin/zainr/unstable-xla 2025-09-07T06:39:17.6097771Z * [new branch] zasdfgbnm-patch-3 -> origin/zasdfgbnm-patch-3 2025-09-07T06:39:17.6097837Z * [new branch] zb2p -> origin/zb2p 2025-09-07T06:39:17.6097917Z * [new branch] zero_grad_optimization -> origin/zero_grad_optimization 2025-09-07T06:39:17.6098069Z * [new branch] zeros-and-scatter-part2 -> origin/zeros-and-scatter-part2 2025-09-07T06:39:17.6098146Z * [new branch] zhxchen17/scratch/0 -> origin/zhxchen17/scratch/0 2025-09-07T06:39:17.6098227Z * [new branch] zhxhcen17/moodycamel -> origin/zhxhcen17/moodycamel 2025-09-07T06:39:17.6098294Z * [new branch] zxiiro/main -> origin/zxiiro/main 2025-09-07T06:39:17.6098463Z * [new tag] bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug -> bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug 2025-09-07T06:39:17.6098526Z * [new tag] ci/binaries/77164 -> ci/binaries/77164 2025-09-07T06:39:17.6098593Z * [new tag] ciflow/binaries/156049 -> ciflow/binaries/156049 2025-09-07T06:39:17.6098658Z * [new tag] ciflow/binaries/156712 -> ciflow/binaries/156712 2025-09-07T06:39:17.6098724Z * [new tag] ciflow/binaries/157432 -> ciflow/binaries/157432 2025-09-07T06:39:17.6098789Z * [new tag] ciflow/binaries/157685 -> ciflow/binaries/157685 2025-09-07T06:39:17.6098853Z * [new tag] ciflow/binaries/157689 -> ciflow/binaries/157689 2025-09-07T06:39:17.6098920Z * [new tag] ciflow/binaries/158104 -> ciflow/binaries/158104 2025-09-07T06:39:17.6098984Z * [new tag] ciflow/binaries/160229 -> ciflow/binaries/160229 2025-09-07T06:39:17.6099048Z * [new tag] ciflow/binaries/160720 -> ciflow/binaries/160720 2025-09-07T06:39:17.6099113Z * [new tag] ciflow/binaries/162080 -> ciflow/binaries/162080 2025-09-07T06:39:17.6099176Z * [new tag] ciflow/binaries/162329 -> ciflow/binaries/162329 2025-09-07T06:39:17.6099263Z * [new tag] ciflow/binaries_libtorch/156049 -> ciflow/binaries_libtorch/156049 2025-09-07T06:39:17.6099349Z * [new tag] ciflow/binaries_libtorch/156711 -> ciflow/binaries_libtorch/156711 2025-09-07T06:39:17.6099434Z * [new tag] ciflow/binaries_libtorch/157432 -> ciflow/binaries_libtorch/157432 2025-09-07T06:39:17.6099511Z * [new tag] ciflow/binaries_wheel/156049 -> ciflow/binaries_wheel/156049 2025-09-07T06:39:17.6099588Z * [new tag] ciflow/binaries_wheel/156711 -> ciflow/binaries_wheel/156711 2025-09-07T06:39:17.6100713Z * [new tag] ciflow/binaries_wheel/157432 -> ciflow/binaries_wheel/157432 2025-09-07T06:39:17.6100790Z * [new tag] ciflow/binaries_wheel/162136 -> ciflow/binaries_wheel/162136 2025-09-07T06:39:17.6100865Z * [new tag] ciflow/binaries_wheel/162252 -> ciflow/binaries_wheel/162252 2025-09-07T06:39:17.6100938Z * [new tag] ciflow/binaries_wheel/162325 -> ciflow/binaries_wheel/162325 2025-09-07T06:39:17.6101023Z * [new tag] ciflow/h100-distributed/156703 -> ciflow/h100-distributed/156703 2025-09-07T06:39:17.6101099Z * [new tag] ciflow/h100-symm-mem/157635 -> ciflow/h100-symm-mem/157635 2025-09-07T06:39:17.6101171Z * [new tag] ciflow/h100-symm-mem/161984 -> ciflow/h100-symm-mem/161984 2025-09-07T06:39:17.6101241Z * [new tag] ciflow/h100-symm-mem/162003 -> ciflow/h100-symm-mem/162003 2025-09-07T06:39:17.6101574Z * [new tag] ciflow/h100-symm-mem/162011 -> ciflow/h100-symm-mem/162011 2025-09-07T06:39:17.6101645Z * [new tag] ciflow/h100-symm-mem/162026 -> ciflow/h100-symm-mem/162026 2025-09-07T06:39:17.6101714Z * [new tag] ciflow/h100-symm-mem/162033 -> ciflow/h100-symm-mem/162033 2025-09-07T06:39:17.6101783Z * [new tag] ciflow/h100-symm-mem/162040 -> ciflow/h100-symm-mem/162040 2025-09-07T06:39:17.6101854Z * [new tag] ciflow/h100-symm-mem/162041 -> ciflow/h100-symm-mem/162041 2025-09-07T06:39:17.6101923Z * [new tag] ciflow/h100-symm-mem/162142 -> ciflow/h100-symm-mem/162142 2025-09-07T06:39:17.6102022Z * [new tag] ciflow/h100-symm-mem/162150 -> ciflow/h100-symm-mem/162150 2025-09-07T06:39:17.6102092Z * [new tag] ciflow/h100-symm-mem/162243 -> ciflow/h100-symm-mem/162243 2025-09-07T06:39:17.6102163Z * [new tag] ciflow/h100-symm-mem/162320 -> ciflow/h100-symm-mem/162320 2025-09-07T06:39:17.6102223Z * [new tag] ciflow/h100/159158 -> ciflow/h100/159158 2025-09-07T06:39:17.6102283Z * [new tag] ciflow/h100/160480 -> ciflow/h100/160480 2025-09-07T06:39:17.6102342Z * [new tag] ciflow/h100/161749 -> ciflow/h100/161749 2025-09-07T06:39:17.6102399Z * [new tag] ciflow/h100/162022 -> ciflow/h100/162022 2025-09-07T06:39:17.6102457Z * [new tag] ciflow/h100/162278 -> ciflow/h100/162278 2025-09-07T06:39:17.6102599Z * [new tag] ciflow/inductor-perf-test-nightly-rocm/156592 -> ciflow/inductor-perf-test-nightly-rocm/156592 2025-09-07T06:39:17.6102720Z * [new tag] ciflow/inductor-perf-test-nightly/156592 -> ciflow/inductor-perf-test-nightly/156592 2025-09-07T06:39:17.6103874Z * [new tag] ciflow/inductor-periodic/162063 -> ciflow/inductor-periodic/162063 2025-09-07T06:39:17.6103967Z * [new tag] ciflow/inductor-periodic/162227 -> ciflow/inductor-periodic/162227 2025-09-07T06:39:17.6104054Z * [new tag] ciflow/inductor-periodic/162323 -> ciflow/inductor-periodic/162323 2025-09-07T06:39:17.6104130Z * [new tag] ciflow/inductor-rocm/154170 -> ciflow/inductor-rocm/154170 2025-09-07T06:39:17.6104202Z * [new tag] ciflow/inductor-rocm/159146 -> ciflow/inductor-rocm/159146 2025-09-07T06:39:17.6104276Z * [new tag] ciflow/inductor-rocm/159158 -> ciflow/inductor-rocm/159158 2025-09-07T06:39:17.6104351Z * [new tag] ciflow/inductor-rocm/161715 -> ciflow/inductor-rocm/161715 2025-09-07T06:39:17.6104424Z * [new tag] ciflow/inductor-rocm/162053 -> ciflow/inductor-rocm/162053 2025-09-07T06:39:17.6104496Z * [new tag] ciflow/inductor-rocm/162056 -> ciflow/inductor-rocm/162056 2025-09-07T06:39:17.6104564Z * [new tag] ciflow/inductor/137400 -> ciflow/inductor/137400 2025-09-07T06:39:17.6104631Z * [new tag] ciflow/inductor/148180 -> ciflow/inductor/148180 2025-09-07T06:39:17.6104697Z * [new tag] ciflow/inductor/148328 -> ciflow/inductor/148328 2025-09-07T06:39:17.6104763Z * [new tag] ciflow/inductor/148484 -> ciflow/inductor/148484 2025-09-07T06:39:17.6104827Z * [new tag] ciflow/inductor/148492 -> ciflow/inductor/148492 2025-09-07T06:39:17.6104890Z * [new tag] ciflow/inductor/152624 -> ciflow/inductor/152624 2025-09-07T06:39:17.6104954Z * [new tag] ciflow/inductor/154694 -> ciflow/inductor/154694 2025-09-07T06:39:17.6105019Z * [new tag] ciflow/inductor/156049 -> ciflow/inductor/156049 2025-09-07T06:39:17.6105083Z * [new tag] ciflow/inductor/156592 -> ciflow/inductor/156592 2025-09-07T06:39:17.6105146Z * [new tag] ciflow/inductor/157635 -> ciflow/inductor/157635 2025-09-07T06:39:17.6105259Z * [new tag] ciflow/inductor/157685 -> ciflow/inductor/157685 2025-09-07T06:39:17.6105324Z * [new tag] ciflow/inductor/157686 -> ciflow/inductor/157686 2025-09-07T06:39:17.6105387Z * [new tag] ciflow/inductor/157689 -> ciflow/inductor/157689 2025-09-07T06:39:17.6105452Z * [new tag] ciflow/inductor/157699 -> ciflow/inductor/157699 2025-09-07T06:39:17.6105516Z * [new tag] ciflow/inductor/157743 -> ciflow/inductor/157743 2025-09-07T06:39:17.6105579Z * [new tag] ciflow/inductor/157994 -> ciflow/inductor/157994 2025-09-07T06:39:17.6106855Z * [new tag] ciflow/inductor/158091 -> ciflow/inductor/158091 2025-09-07T06:39:17.6106921Z * [new tag] ciflow/inductor/158104 -> ciflow/inductor/158104 2025-09-07T06:39:17.6106984Z * [new tag] ciflow/inductor/158404 -> ciflow/inductor/158404 2025-09-07T06:39:17.6107051Z * [new tag] ciflow/inductor/158647 -> ciflow/inductor/158647 2025-09-07T06:39:17.6107115Z * [new tag] ciflow/inductor/158932 -> ciflow/inductor/158932 2025-09-07T06:39:17.6107178Z * [new tag] ciflow/inductor/159146 -> ciflow/inductor/159146 2025-09-07T06:39:17.6107244Z * [new tag] ciflow/inductor/159158 -> ciflow/inductor/159158 2025-09-07T06:39:17.6107307Z * [new tag] ciflow/inductor/159274 -> ciflow/inductor/159274 2025-09-07T06:39:17.6107371Z * [new tag] ciflow/inductor/159664 -> ciflow/inductor/159664 2025-09-07T06:39:17.6107435Z * [new tag] ciflow/inductor/159778 -> ciflow/inductor/159778 2025-09-07T06:39:17.6107501Z * [new tag] ciflow/inductor/159835 -> ciflow/inductor/159835 2025-09-07T06:39:17.6107565Z * [new tag] ciflow/inductor/159944 -> ciflow/inductor/159944 2025-09-07T06:39:17.6107631Z * [new tag] ciflow/inductor/160161 -> ciflow/inductor/160161 2025-09-07T06:39:17.6107695Z * [new tag] ciflow/inductor/160174 -> ciflow/inductor/160174 2025-09-07T06:39:17.6107759Z * [new tag] ciflow/inductor/160323 -> ciflow/inductor/160323 2025-09-07T06:39:17.6107824Z * [new tag] ciflow/inductor/160324 -> ciflow/inductor/160324 2025-09-07T06:39:17.6107889Z * [new tag] ciflow/inductor/160325 -> ciflow/inductor/160325 2025-09-07T06:39:17.6107952Z * [new tag] ciflow/inductor/160326 -> ciflow/inductor/160326 2025-09-07T06:39:17.6108016Z * [new tag] ciflow/inductor/160327 -> ciflow/inductor/160327 2025-09-07T06:39:17.6108083Z * [new tag] ciflow/inductor/160328 -> ciflow/inductor/160328 2025-09-07T06:39:17.6108146Z * [new tag] ciflow/inductor/160329 -> ciflow/inductor/160329 2025-09-07T06:39:17.6108211Z * [new tag] ciflow/inductor/160480 -> ciflow/inductor/160480 2025-09-07T06:39:17.6108276Z * [new tag] ciflow/inductor/160532 -> ciflow/inductor/160532 2025-09-07T06:39:17.6108340Z * [new tag] ciflow/inductor/160539 -> ciflow/inductor/160539 2025-09-07T06:39:17.6109458Z * [new tag] ciflow/inductor/160580 -> ciflow/inductor/160580 2025-09-07T06:39:17.6109525Z * [new tag] ciflow/inductor/160685 -> ciflow/inductor/160685 2025-09-07T06:39:17.6109589Z * [new tag] ciflow/inductor/160686 -> ciflow/inductor/160686 2025-09-07T06:39:17.6109652Z * [new tag] ciflow/inductor/160687 -> ciflow/inductor/160687 2025-09-07T06:39:17.6109719Z * [new tag] ciflow/inductor/160688 -> ciflow/inductor/160688 2025-09-07T06:39:17.6109783Z * [new tag] ciflow/inductor/160690 -> ciflow/inductor/160690 2025-09-07T06:39:17.6109899Z * [new tag] ciflow/inductor/160706 -> ciflow/inductor/160706 2025-09-07T06:39:17.6109965Z * [new tag] ciflow/inductor/160729 -> ciflow/inductor/160729 2025-09-07T06:39:17.6110029Z * [new tag] ciflow/inductor/160798 -> ciflow/inductor/160798 2025-09-07T06:39:17.6110092Z * [new tag] ciflow/inductor/160836 -> ciflow/inductor/160836 2025-09-07T06:39:17.6110157Z * [new tag] ciflow/inductor/160843 -> ciflow/inductor/160843 2025-09-07T06:39:17.6110222Z * [new tag] ciflow/inductor/160869 -> ciflow/inductor/160869 2025-09-07T06:39:17.6110285Z * [new tag] ciflow/inductor/160920 -> ciflow/inductor/160920 2025-09-07T06:39:17.6110383Z * [new tag] ciflow/inductor/160943 -> ciflow/inductor/160943 2025-09-07T06:39:17.6110446Z * [new tag] ciflow/inductor/161092 -> ciflow/inductor/161092 2025-09-07T06:39:17.6110510Z * [new tag] ciflow/inductor/161093 -> ciflow/inductor/161093 2025-09-07T06:39:17.6110577Z * [new tag] ciflow/inductor/161109 -> ciflow/inductor/161109 2025-09-07T06:39:17.6110641Z * [new tag] ciflow/inductor/161118 -> ciflow/inductor/161118 2025-09-07T06:39:17.6110704Z * [new tag] ciflow/inductor/161178 -> ciflow/inductor/161178 2025-09-07T06:39:17.6110769Z * [new tag] ciflow/inductor/161246 -> ciflow/inductor/161246 2025-09-07T06:39:17.6110833Z * [new tag] ciflow/inductor/161349 -> ciflow/inductor/161349 2025-09-07T06:39:17.6110897Z * [new tag] ciflow/inductor/161350 -> ciflow/inductor/161350 2025-09-07T06:39:17.6110962Z * [new tag] ciflow/inductor/161351 -> ciflow/inductor/161351 2025-09-07T06:39:17.6111027Z * [new tag] ciflow/inductor/161397 -> ciflow/inductor/161397 2025-09-07T06:39:17.6112137Z * [new tag] ciflow/inductor/161404 -> ciflow/inductor/161404 2025-09-07T06:39:17.6112206Z * [new tag] ciflow/inductor/161405 -> ciflow/inductor/161405 2025-09-07T06:39:17.6112270Z * [new tag] ciflow/inductor/161406 -> ciflow/inductor/161406 2025-09-07T06:39:17.6112333Z * [new tag] ciflow/inductor/161410 -> ciflow/inductor/161410 2025-09-07T06:39:17.6112397Z * [new tag] ciflow/inductor/161414 -> ciflow/inductor/161414 2025-09-07T06:39:17.6112463Z * [new tag] ciflow/inductor/161442 -> ciflow/inductor/161442 2025-09-07T06:39:17.6112527Z * [new tag] ciflow/inductor/161458 -> ciflow/inductor/161458 2025-09-07T06:39:17.6112593Z * [new tag] ciflow/inductor/161468 -> ciflow/inductor/161468 2025-09-07T06:39:17.6112657Z * [new tag] ciflow/inductor/161469 -> ciflow/inductor/161469 2025-09-07T06:39:17.6112721Z * [new tag] ciflow/inductor/161485 -> ciflow/inductor/161485 2025-09-07T06:39:17.6112786Z * [new tag] ciflow/inductor/161499 -> ciflow/inductor/161499 2025-09-07T06:39:17.6112851Z * [new tag] ciflow/inductor/161534 -> ciflow/inductor/161534 2025-09-07T06:39:17.6112914Z * [new tag] ciflow/inductor/161595 -> ciflow/inductor/161595 2025-09-07T06:39:17.6112978Z * [new tag] ciflow/inductor/161596 -> ciflow/inductor/161596 2025-09-07T06:39:17.6113042Z * [new tag] ciflow/inductor/161630 -> ciflow/inductor/161630 2025-09-07T06:39:17.6113106Z * [new tag] ciflow/inductor/161667 -> ciflow/inductor/161667 2025-09-07T06:39:17.6113171Z * [new tag] ciflow/inductor/161670 -> ciflow/inductor/161670 2025-09-07T06:39:17.6113236Z * [new tag] ciflow/inductor/161673 -> ciflow/inductor/161673 2025-09-07T06:39:17.6113300Z * [new tag] ciflow/inductor/161674 -> ciflow/inductor/161674 2025-09-07T06:39:17.6113398Z * [new tag] ciflow/inductor/161675 -> ciflow/inductor/161675 2025-09-07T06:39:17.6113463Z * [new tag] ciflow/inductor/161693 -> ciflow/inductor/161693 2025-09-07T06:39:17.6113527Z * [new tag] ciflow/inductor/161695 -> ciflow/inductor/161695 2025-09-07T06:39:17.6113591Z * [new tag] ciflow/inductor/161715 -> ciflow/inductor/161715 2025-09-07T06:39:17.6113654Z * [new tag] ciflow/inductor/161730 -> ciflow/inductor/161730 2025-09-07T06:39:17.6114751Z * [new tag] ciflow/inductor/161732 -> ciflow/inductor/161732 2025-09-07T06:39:17.6114817Z * [new tag] ciflow/inductor/161744 -> ciflow/inductor/161744 2025-09-07T06:39:17.6114914Z * [new tag] ciflow/inductor/161746 -> ciflow/inductor/161746 2025-09-07T06:39:17.6114978Z * [new tag] ciflow/inductor/161747 -> ciflow/inductor/161747 2025-09-07T06:39:17.6115044Z * [new tag] ciflow/inductor/161819 -> ciflow/inductor/161819 2025-09-07T06:39:17.6115108Z * [new tag] ciflow/inductor/161821 -> ciflow/inductor/161821 2025-09-07T06:39:17.6115171Z * [new tag] ciflow/inductor/161828 -> ciflow/inductor/161828 2025-09-07T06:39:17.6115235Z * [new tag] ciflow/inductor/161879 -> ciflow/inductor/161879 2025-09-07T06:39:17.6115298Z * [new tag] ciflow/inductor/161880 -> ciflow/inductor/161880 2025-09-07T06:39:17.6115362Z * [new tag] ciflow/inductor/161881 -> ciflow/inductor/161881 2025-09-07T06:39:17.6115426Z * [new tag] ciflow/inductor/161907 -> ciflow/inductor/161907 2025-09-07T06:39:17.6115491Z * [new tag] ciflow/inductor/161914 -> ciflow/inductor/161914 2025-09-07T06:39:17.6115555Z * [new tag] ciflow/inductor/161924 -> ciflow/inductor/161924 2025-09-07T06:39:17.6115620Z * [new tag] ciflow/inductor/161936 -> ciflow/inductor/161936 2025-09-07T06:39:17.6115683Z * [new tag] ciflow/inductor/161938 -> ciflow/inductor/161938 2025-09-07T06:39:17.6115749Z * [new tag] ciflow/inductor/161939 -> ciflow/inductor/161939 2025-09-07T06:39:17.6115812Z * [new tag] ciflow/inductor/161940 -> ciflow/inductor/161940 2025-09-07T06:39:17.6115876Z * [new tag] ciflow/inductor/161955 -> ciflow/inductor/161955 2025-09-07T06:39:17.6115941Z * [new tag] ciflow/inductor/161957 -> ciflow/inductor/161957 2025-09-07T06:39:17.6116004Z * [new tag] ciflow/inductor/161975 -> ciflow/inductor/161975 2025-09-07T06:39:17.6116070Z * [new tag] ciflow/inductor/161977 -> ciflow/inductor/161977 2025-09-07T06:39:17.6116138Z * [new tag] ciflow/inductor/161978 -> ciflow/inductor/161978 2025-09-07T06:39:17.6116203Z * [new tag] ciflow/inductor/161979 -> ciflow/inductor/161979 2025-09-07T06:39:17.6116267Z * [new tag] ciflow/inductor/161980 -> ciflow/inductor/161980 2025-09-07T06:39:17.6117443Z * [new tag] ciflow/inductor/161988 -> ciflow/inductor/161988 2025-09-07T06:39:17.6117511Z * [new tag] ciflow/inductor/161994 -> ciflow/inductor/161994 2025-09-07T06:39:17.6117575Z * [new tag] ciflow/inductor/162013 -> ciflow/inductor/162013 2025-09-07T06:39:17.6117640Z * [new tag] ciflow/inductor/162014 -> ciflow/inductor/162014 2025-09-07T06:39:17.6117704Z * [new tag] ciflow/inductor/162017 -> ciflow/inductor/162017 2025-09-07T06:39:17.6117769Z * [new tag] ciflow/inductor/162021 -> ciflow/inductor/162021 2025-09-07T06:39:17.6117834Z * [new tag] ciflow/inductor/162023 -> ciflow/inductor/162023 2025-09-07T06:39:17.6117897Z * [new tag] ciflow/inductor/162027 -> ciflow/inductor/162027 2025-09-07T06:39:17.6118107Z * [new tag] ciflow/inductor/162029 -> ciflow/inductor/162029 2025-09-07T06:39:17.6118172Z * [new tag] ciflow/inductor/162030 -> ciflow/inductor/162030 2025-09-07T06:39:17.6118236Z * [new tag] ciflow/inductor/162031 -> ciflow/inductor/162031 2025-09-07T06:39:17.6118300Z * [new tag] ciflow/inductor/162033 -> ciflow/inductor/162033 2025-09-07T06:39:17.6118364Z * [new tag] ciflow/inductor/162052 -> ciflow/inductor/162052 2025-09-07T06:39:17.6118429Z * [new tag] ciflow/inductor/162053 -> ciflow/inductor/162053 2025-09-07T06:39:17.6118540Z * [new tag] ciflow/inductor/162056 -> ciflow/inductor/162056 2025-09-07T06:39:17.6118604Z * [new tag] ciflow/inductor/162063 -> ciflow/inductor/162063 2025-09-07T06:39:17.6118670Z * [new tag] ciflow/inductor/162066 -> ciflow/inductor/162066 2025-09-07T06:39:17.6118735Z * [new tag] ciflow/inductor/162068 -> ciflow/inductor/162068 2025-09-07T06:39:17.6118799Z * [new tag] ciflow/inductor/162081 -> ciflow/inductor/162081 2025-09-07T06:39:17.6118865Z * [new tag] ciflow/inductor/162088 -> ciflow/inductor/162088 2025-09-07T06:39:17.6118929Z * [new tag] ciflow/inductor/162089 -> ciflow/inductor/162089 2025-09-07T06:39:17.6118992Z * [new tag] ciflow/inductor/162094 -> ciflow/inductor/162094 2025-09-07T06:39:17.6119058Z * [new tag] ciflow/inductor/162098 -> ciflow/inductor/162098 2025-09-07T06:39:17.6119125Z * [new tag] ciflow/inductor/162101 -> ciflow/inductor/162101 2025-09-07T06:39:17.6119191Z * [new tag] ciflow/inductor/162102 -> ciflow/inductor/162102 2025-09-07T06:39:17.6120314Z * [new tag] ciflow/inductor/162104 -> ciflow/inductor/162104 2025-09-07T06:39:17.6120384Z * [new tag] ciflow/inductor/162106 -> ciflow/inductor/162106 2025-09-07T06:39:17.6120448Z * [new tag] ciflow/inductor/162108 -> ciflow/inductor/162108 2025-09-07T06:39:17.6120512Z * [new tag] ciflow/inductor/162126 -> ciflow/inductor/162126 2025-09-07T06:39:17.6120576Z * [new tag] ciflow/inductor/162149 -> ciflow/inductor/162149 2025-09-07T06:39:17.6120640Z * [new tag] ciflow/inductor/162164 -> ciflow/inductor/162164 2025-09-07T06:39:17.6120705Z * [new tag] ciflow/inductor/162166 -> ciflow/inductor/162166 2025-09-07T06:39:17.6120770Z * [new tag] ciflow/inductor/162169 -> ciflow/inductor/162169 2025-09-07T06:39:17.6120833Z * [new tag] ciflow/inductor/162170 -> ciflow/inductor/162170 2025-09-07T06:39:17.6120897Z * [new tag] ciflow/inductor/162171 -> ciflow/inductor/162171 2025-09-07T06:39:17.6120964Z * [new tag] ciflow/inductor/162183 -> ciflow/inductor/162183 2025-09-07T06:39:17.6121027Z * [new tag] ciflow/inductor/162189 -> ciflow/inductor/162189 2025-09-07T06:39:17.6121091Z * [new tag] ciflow/inductor/162190 -> ciflow/inductor/162190 2025-09-07T06:39:17.6121156Z * [new tag] ciflow/inductor/162191 -> ciflow/inductor/162191 2025-09-07T06:39:17.6121219Z * [new tag] ciflow/inductor/162194 -> ciflow/inductor/162194 2025-09-07T06:39:17.6121283Z * [new tag] ciflow/inductor/162200 -> ciflow/inductor/162200 2025-09-07T06:39:17.6121347Z * [new tag] ciflow/inductor/162201 -> ciflow/inductor/162201 2025-09-07T06:39:17.6121413Z * [new tag] ciflow/inductor/162208 -> ciflow/inductor/162208 2025-09-07T06:39:17.6121476Z * [new tag] ciflow/inductor/162211 -> ciflow/inductor/162211 2025-09-07T06:39:17.6121586Z * [new tag] ciflow/inductor/162216 -> ciflow/inductor/162216 2025-09-07T06:39:17.6121651Z * [new tag] ciflow/inductor/162220 -> ciflow/inductor/162220 2025-09-07T06:39:17.6121714Z * [new tag] ciflow/inductor/162222 -> ciflow/inductor/162222 2025-09-07T06:39:17.6121779Z * [new tag] ciflow/inductor/162227 -> ciflow/inductor/162227 2025-09-07T06:39:17.6121843Z * [new tag] ciflow/inductor/162238 -> ciflow/inductor/162238 2025-09-07T06:39:17.6122952Z * [new tag] ciflow/inductor/162239 -> ciflow/inductor/162239 2025-09-07T06:39:17.6123019Z * [new tag] ciflow/inductor/162240 -> ciflow/inductor/162240 2025-09-07T06:39:17.6123119Z * [new tag] ciflow/inductor/162244 -> ciflow/inductor/162244 2025-09-07T06:39:17.6123182Z * [new tag] ciflow/inductor/162245 -> ciflow/inductor/162245 2025-09-07T06:39:17.6123248Z * [new tag] ciflow/inductor/162262 -> ciflow/inductor/162262 2025-09-07T06:39:17.6123312Z * [new tag] ciflow/inductor/162275 -> ciflow/inductor/162275 2025-09-07T06:39:17.6123376Z * [new tag] ciflow/inductor/162278 -> ciflow/inductor/162278 2025-09-07T06:39:17.6123440Z * [new tag] ciflow/inductor/162284 -> ciflow/inductor/162284 2025-09-07T06:39:17.6123504Z * [new tag] ciflow/inductor/162286 -> ciflow/inductor/162286 2025-09-07T06:39:17.6123568Z * [new tag] ciflow/inductor/162288 -> ciflow/inductor/162288 2025-09-07T06:39:17.6123634Z * [new tag] ciflow/inductor/162293 -> ciflow/inductor/162293 2025-09-07T06:39:17.6123699Z * [new tag] ciflow/inductor/162294 -> ciflow/inductor/162294 2025-09-07T06:39:17.6123762Z * [new tag] ciflow/inductor/162295 -> ciflow/inductor/162295 2025-09-07T06:39:17.6123827Z * [new tag] ciflow/inductor/162296 -> ciflow/inductor/162296 2025-09-07T06:39:17.6123891Z * [new tag] ciflow/inductor/162298 -> ciflow/inductor/162298 2025-09-07T06:39:17.6123955Z * [new tag] ciflow/inductor/162307 -> ciflow/inductor/162307 2025-09-07T06:39:17.6124019Z * [new tag] ciflow/inductor/162309 -> ciflow/inductor/162309 2025-09-07T06:39:17.6124084Z * [new tag] ciflow/inductor/162311 -> ciflow/inductor/162311 2025-09-07T06:39:17.6124148Z * [new tag] ciflow/inductor/162312 -> ciflow/inductor/162312 2025-09-07T06:39:17.6124212Z * [new tag] ciflow/inductor/162315 -> ciflow/inductor/162315 2025-09-07T06:39:17.6124279Z * [new tag] ciflow/inductor/162316 -> ciflow/inductor/162316 2025-09-07T06:39:17.6124342Z * [new tag] ciflow/inductor/162318 -> ciflow/inductor/162318 2025-09-07T06:39:17.6124406Z * [new tag] ciflow/inductor/162323 -> ciflow/inductor/162323 2025-09-07T06:39:17.6124472Z * [new tag] ciflow/inductor/162341 -> ciflow/inductor/162341 2025-09-07T06:39:17.6125578Z * [new tag] ciflow/inductor/162345 -> ciflow/inductor/162345 2025-09-07T06:39:17.6125650Z * [new tag] ciflow/inductor/3b9a386 -> ciflow/inductor/3b9a386 2025-09-07T06:39:17.6125721Z * [new tag] ciflow/inductor/3d4b92b -> ciflow/inductor/3d4b92b 2025-09-07T06:39:17.6125789Z * [new tag] ciflow/inductor/d224ac7 -> ciflow/inductor/d224ac7 2025-09-07T06:39:17.6125863Z * [new tag] ciflow/linux-aarch64/157994 -> ciflow/linux-aarch64/157994 2025-09-07T06:39:17.6125939Z * [new tag] ciflow/linux-aarch64/159737 -> ciflow/linux-aarch64/159737 2025-09-07T06:39:17.6126011Z * [new tag] ciflow/linux-aarch64/160078 -> ciflow/linux-aarch64/160078 2025-09-07T06:39:17.6126072Z * [new tag] ciflow/mps/157553 -> ciflow/mps/157553 2025-09-07T06:39:17.6126167Z * [new tag] ciflow/mps/157635 -> ciflow/mps/157635 2025-09-07T06:39:17.6126226Z * [new tag] ciflow/mps/161988 -> ciflow/mps/161988 2025-09-07T06:39:17.6126285Z * [new tag] ciflow/mps/162108 -> ciflow/mps/162108 2025-09-07T06:39:17.6126344Z * [new tag] ciflow/mps/162153 -> ciflow/mps/162153 2025-09-07T06:39:17.6126401Z * [new tag] ciflow/mps/162281 -> ciflow/mps/162281 2025-09-07T06:39:17.6126468Z * [new tag] ciflow/nightly/156049 -> ciflow/nightly/156049 2025-09-07T06:39:17.6126664Z * [new tag] ciflow/nightly/158104 -> ciflow/nightly/158104 2025-09-07T06:39:17.6126739Z * [new tag] ciflow/op-benchmark/157994 -> ciflow/op-benchmark/157994 2025-09-07T06:39:17.6126838Z * [new tag] ciflow/periodic-rocm-mi300/161529 -> ciflow/periodic-rocm-mi300/161529 2025-09-07T06:39:17.6126933Z * [new tag] ciflow/periodic-rocm-mi300/161715 -> ciflow/periodic-rocm-mi300/161715 2025-09-07T06:39:17.6127005Z * [new tag] ciflow/periodic/054a2fd -> ciflow/periodic/054a2fd 2025-09-07T06:39:17.6127072Z * [new tag] ciflow/periodic/156703 -> ciflow/periodic/156703 2025-09-07T06:39:17.6127138Z * [new tag] ciflow/periodic/161715 -> ciflow/periodic/161715 2025-09-07T06:39:17.6127203Z * [new tag] ciflow/periodic/162021 -> ciflow/periodic/162021 2025-09-07T06:39:17.6127268Z * [new tag] ciflow/periodic/162323 -> ciflow/periodic/162323 2025-09-07T06:39:17.6127337Z * [new tag] ciflow/periodic/2a6d37d -> ciflow/periodic/2a6d37d 2025-09-07T06:39:17.6128468Z * [new tag] ciflow/periodic/317eeb8 -> ciflow/periodic/317eeb8 2025-09-07T06:39:17.6128536Z * [new tag] ciflow/periodic/3c32 -> ciflow/periodic/3c32 2025-09-07T06:39:17.6128604Z * [new tag] ciflow/periodic/3e98831 -> ciflow/periodic/3e98831 2025-09-07T06:39:17.6128682Z * [new tag] ciflow/periodic/94512-point -> ciflow/periodic/94512-point 2025-09-07T06:39:17.6128767Z * [new tag] ciflow/periodic/csl/test87519 -> ciflow/periodic/csl/test87519 2025-09-07T06:39:17.6128848Z * [new tag] ciflow/periodic/csltest88275 -> ciflow/periodic/csltest88275 2025-09-07T06:39:17.6128927Z * [new tag] ciflow/periodic/csltest88761 -> ciflow/periodic/csltest88761 2025-09-07T06:39:17.6129004Z * [new tag] ciflow/periodic/release_1.12 -> ciflow/periodic/release_1.12 2025-09-07T06:39:17.6129091Z * [new tag] ciflow/periodic/release_1.12.0 -> ciflow/periodic/release_1.12.0 2025-09-07T06:39:17.6129167Z * [new tag] ciflow/periodic/sha-ec5b83 -> ciflow/periodic/sha-ec5b83 2025-09-07T06:39:17.6129238Z * [new tag] ciflow/rocm-mi300/154170 -> ciflow/rocm-mi300/154170 2025-09-07T06:39:17.6129304Z * [new tag] ciflow/rocm-mi300/158747 -> ciflow/rocm-mi300/158747 2025-09-07T06:39:17.6129372Z * [new tag] ciflow/rocm-mi300/159146 -> ciflow/rocm-mi300/159146 2025-09-07T06:39:17.6129436Z * [new tag] ciflow/rocm-mi300/159158 -> ciflow/rocm-mi300/159158 2025-09-07T06:39:17.6129500Z * [new tag] ciflow/rocm-mi300/161715 -> ciflow/rocm-mi300/161715 2025-09-07T06:39:17.6129564Z * [new tag] ciflow/rocm-mi300/161957 -> ciflow/rocm-mi300/161957 2025-09-07T06:39:17.6129629Z * [new tag] ciflow/rocm-mi300/162053 -> ciflow/rocm-mi300/162053 2025-09-07T06:39:17.6129696Z * [new tag] ciflow/rocm-mi300/162056 -> ciflow/rocm-mi300/162056 2025-09-07T06:39:17.6129760Z * [new tag] ciflow/rocm-mi300/162112 -> ciflow/rocm-mi300/162112 2025-09-07T06:39:17.6129889Z * [new tag] ciflow/rocm-mi300/162245 -> ciflow/rocm-mi300/162245 2025-09-07T06:39:17.6129954Z * [new tag] ciflow/rocm-mi300/162278 -> ciflow/rocm-mi300/162278 2025-09-07T06:39:17.6130017Z * [new tag] ciflow/rocm-mi300/162288 -> ciflow/rocm-mi300/162288 2025-09-07T06:39:17.6130083Z * [new tag] ciflow/rocm-mi355/162053 -> ciflow/rocm-mi355/162053 2025-09-07T06:39:17.6130147Z * [new tag] ciflow/rocm-mi355/162056 -> ciflow/rocm-mi355/162056 2025-09-07T06:39:17.6130209Z * [new tag] ciflow/rocm/148492 -> ciflow/rocm/148492 2025-09-07T06:39:17.6131330Z * [new tag] ciflow/rocm/154170 -> ciflow/rocm/154170 2025-09-07T06:39:17.6131436Z * [new tag] ciflow/rocm/156491 -> ciflow/rocm/156491 2025-09-07T06:39:17.6131495Z * [new tag] ciflow/rocm/156592 -> ciflow/rocm/156592 2025-09-07T06:39:17.6131555Z * [new tag] ciflow/rocm/158747 -> ciflow/rocm/158747 2025-09-07T06:39:17.6131614Z * [new tag] ciflow/rocm/159146 -> ciflow/rocm/159146 2025-09-07T06:39:17.6131672Z * [new tag] ciflow/rocm/159158 -> ciflow/rocm/159158 2025-09-07T06:39:17.6131731Z * [new tag] ciflow/rocm/161715 -> ciflow/rocm/161715 2025-09-07T06:39:17.6131789Z * [new tag] ciflow/rocm/161972 -> ciflow/rocm/161972 2025-09-07T06:39:17.6131847Z * [new tag] ciflow/rocm/162052 -> ciflow/rocm/162052 2025-09-07T06:39:17.6131906Z * [new tag] ciflow/rocm/162053 -> ciflow/rocm/162053 2025-09-07T06:39:17.6131964Z * [new tag] ciflow/rocm/162056 -> ciflow/rocm/162056 2025-09-07T06:39:17.6132022Z * [new tag] ciflow/rocm/162112 -> ciflow/rocm/162112 2025-09-07T06:39:17.6132081Z * [new tag] ciflow/rocm/162278 -> ciflow/rocm/162278 2025-09-07T06:39:17.6132141Z * [new tag] ciflow/rocm/162288 -> ciflow/rocm/162288 2025-09-07T06:39:17.6132198Z * [new tag] ciflow/rocm/162305 -> ciflow/rocm/162305 2025-09-07T06:39:17.6132261Z * [new tag] ciflow/slow/01c7106 -> ciflow/slow/01c7106 2025-09-07T06:39:17.6132321Z * [new tag] ciflow/slow/0577043 -> ciflow/slow/0577043 2025-09-07T06:39:17.6132507Z * [new tag] ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym -> ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym 2025-09-07T06:39:17.6132568Z * [new tag] ciflow/slow/0e81104 -> ciflow/slow/0e81104 2025-09-07T06:39:17.6132628Z * [new tag] ciflow/slow/161395 -> ciflow/slow/161395 2025-09-07T06:39:17.6132687Z * [new tag] ciflow/slow/1732077 -> ciflow/slow/1732077 2025-09-07T06:39:17.6132747Z * [new tag] ciflow/slow/187eb7c -> ciflow/slow/187eb7c 2025-09-07T06:39:17.6132809Z * [new tag] ciflow/slow/1faef89 -> ciflow/slow/1faef89 2025-09-07T06:39:17.6132868Z * [new tag] ciflow/slow/3920ec1 -> ciflow/slow/3920ec1 2025-09-07T06:39:17.6133975Z * [new tag] ciflow/slow/3b7c6b2 -> ciflow/slow/3b7c6b2 2025-09-07T06:39:17.6134038Z * [new tag] ciflow/slow/59a3759 -> ciflow/slow/59a3759 2025-09-07T06:39:17.6134098Z * [new tag] ciflow/slow/70ef0bb -> ciflow/slow/70ef0bb 2025-09-07T06:39:17.6134157Z * [new tag] ciflow/slow/788ff06 -> ciflow/slow/788ff06 2025-09-07T06:39:17.6134323Z * [new tag] ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym -> ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym 2025-09-07T06:39:17.6134384Z * [new tag] ciflow/slow/9d85864 -> ciflow/slow/9d85864 2025-09-07T06:39:17.6134445Z * [new tag] ciflow/slow/9ffad5b -> ciflow/slow/9ffad5b 2025-09-07T06:39:17.6134544Z * [new tag] ciflow/slow/a206e8b -> ciflow/slow/a206e8b 2025-09-07T06:39:17.6134604Z * [new tag] ciflow/slow/a837609 -> ciflow/slow/a837609 2025-09-07T06:39:17.6134664Z * [new tag] ciflow/slow/af841f3 -> ciflow/slow/af841f3 2025-09-07T06:39:17.6134840Z * [new tag] ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym -> ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym 2025-09-07T06:39:17.6134921Z * [new tag] ciflow/triton_binaries/162329 -> ciflow/triton_binaries/162329 2025-09-07T06:39:17.6134981Z * [new tag] ciflow/trunk/113258 -> ciflow/trunk/113258 2025-09-07T06:39:17.6135079Z * [new tag] ciflow/trunk/137400 -> ciflow/trunk/137400 2025-09-07T06:39:17.6135138Z * [new tag] ciflow/trunk/148180 -> ciflow/trunk/148180 2025-09-07T06:39:17.6135198Z * [new tag] ciflow/trunk/148328 -> ciflow/trunk/148328 2025-09-07T06:39:17.6135258Z * [new tag] ciflow/trunk/148492 -> ciflow/trunk/148492 2025-09-07T06:39:17.6135317Z * [new tag] ciflow/trunk/148919 -> ciflow/trunk/148919 2025-09-07T06:39:17.6135376Z * [new tag] ciflow/trunk/152624 -> ciflow/trunk/152624 2025-09-07T06:39:17.6135435Z * [new tag] ciflow/trunk/154170 -> ciflow/trunk/154170 2025-09-07T06:39:17.6135493Z * [new tag] ciflow/trunk/154694 -> ciflow/trunk/154694 2025-09-07T06:39:17.6135552Z * [new tag] ciflow/trunk/156049 -> ciflow/trunk/156049 2025-09-07T06:39:17.6135612Z * [new tag] ciflow/trunk/156703 -> ciflow/trunk/156703 2025-09-07T06:39:17.6135672Z * [new tag] ciflow/trunk/156711 -> ciflow/trunk/156711 2025-09-07T06:39:17.6136858Z * [new tag] ciflow/trunk/157432 -> ciflow/trunk/157432 2025-09-07T06:39:17.6136923Z * [new tag] ciflow/trunk/157685 -> ciflow/trunk/157685 2025-09-07T06:39:17.6136982Z * [new tag] ciflow/trunk/157689 -> ciflow/trunk/157689 2025-09-07T06:39:17.6137041Z * [new tag] ciflow/trunk/157699 -> ciflow/trunk/157699 2025-09-07T06:39:17.6137100Z * [new tag] ciflow/trunk/157813 -> ciflow/trunk/157813 2025-09-07T06:39:17.6137160Z * [new tag] ciflow/trunk/157994 -> ciflow/trunk/157994 2025-09-07T06:39:17.6137219Z * [new tag] ciflow/trunk/158091 -> ciflow/trunk/158091 2025-09-07T06:39:17.6137277Z * [new tag] ciflow/trunk/158104 -> ciflow/trunk/158104 2025-09-07T06:39:17.6137338Z * [new tag] ciflow/trunk/158404 -> ciflow/trunk/158404 2025-09-07T06:39:17.6137396Z * [new tag] ciflow/trunk/158647 -> ciflow/trunk/158647 2025-09-07T06:39:17.6137456Z * [new tag] ciflow/trunk/158846 -> ciflow/trunk/158846 2025-09-07T06:39:17.6137516Z * [new tag] ciflow/trunk/159158 -> ciflow/trunk/159158 2025-09-07T06:39:17.6137573Z * [new tag] ciflow/trunk/159682 -> ciflow/trunk/159682 2025-09-07T06:39:17.6137632Z * [new tag] ciflow/trunk/159835 -> ciflow/trunk/159835 2025-09-07T06:39:17.6137691Z * [new tag] ciflow/trunk/160161 -> ciflow/trunk/160161 2025-09-07T06:39:17.6137749Z * [new tag] ciflow/trunk/160236 -> ciflow/trunk/160236 2025-09-07T06:39:17.6137807Z * [new tag] ciflow/trunk/160329 -> ciflow/trunk/160329 2025-09-07T06:39:17.6137868Z * [new tag] ciflow/trunk/160480 -> ciflow/trunk/160480 2025-09-07T06:39:17.6137927Z * [new tag] ciflow/trunk/160532 -> ciflow/trunk/160532 2025-09-07T06:39:17.6137986Z * [new tag] ciflow/trunk/160836 -> ciflow/trunk/160836 2025-09-07T06:39:17.6138107Z * [new tag] ciflow/trunk/160843 -> ciflow/trunk/160843 2025-09-07T06:39:17.6138166Z * [new tag] ciflow/trunk/160869 -> ciflow/trunk/160869 2025-09-07T06:39:17.6138225Z * [new tag] ciflow/trunk/160940 -> ciflow/trunk/160940 2025-09-07T06:39:17.6138283Z * [new tag] ciflow/trunk/160943 -> ciflow/trunk/160943 2025-09-07T06:39:17.6139397Z * [new tag] ciflow/trunk/160953 -> ciflow/trunk/160953 2025-09-07T06:39:17.6139459Z * [new tag] ciflow/trunk/161035 -> ciflow/trunk/161035 2025-09-07T06:39:17.6139574Z * [new tag] ciflow/trunk/161178 -> ciflow/trunk/161178 2025-09-07T06:39:17.6139633Z * [new tag] ciflow/trunk/161349 -> ciflow/trunk/161349 2025-09-07T06:39:17.6139691Z * [new tag] ciflow/trunk/161350 -> ciflow/trunk/161350 2025-09-07T06:39:17.6139751Z * [new tag] ciflow/trunk/161351 -> ciflow/trunk/161351 2025-09-07T06:39:17.6139810Z * [new tag] ciflow/trunk/161395 -> ciflow/trunk/161395 2025-09-07T06:39:17.6139868Z * [new tag] ciflow/trunk/161405 -> ciflow/trunk/161405 2025-09-07T06:39:17.6139927Z * [new tag] ciflow/trunk/161406 -> ciflow/trunk/161406 2025-09-07T06:39:17.6139986Z * [new tag] ciflow/trunk/161410 -> ciflow/trunk/161410 2025-09-07T06:39:17.6140044Z * [new tag] ciflow/trunk/161468 -> ciflow/trunk/161468 2025-09-07T06:39:17.6140102Z * [new tag] ciflow/trunk/161499 -> ciflow/trunk/161499 2025-09-07T06:39:17.6140163Z * [new tag] ciflow/trunk/161527 -> ciflow/trunk/161527 2025-09-07T06:39:17.6140222Z * [new tag] ciflow/trunk/161534 -> ciflow/trunk/161534 2025-09-07T06:39:17.6140282Z * [new tag] ciflow/trunk/161591 -> ciflow/trunk/161591 2025-09-07T06:39:17.6140342Z * [new tag] ciflow/trunk/161595 -> ciflow/trunk/161595 2025-09-07T06:39:17.6140400Z * [new tag] ciflow/trunk/161596 -> ciflow/trunk/161596 2025-09-07T06:39:17.6140458Z * [new tag] ciflow/trunk/161633 -> ciflow/trunk/161633 2025-09-07T06:39:17.6140518Z * [new tag] ciflow/trunk/161634 -> ciflow/trunk/161634 2025-09-07T06:39:17.6140576Z * [new tag] ciflow/trunk/161635 -> ciflow/trunk/161635 2025-09-07T06:39:17.6140635Z * [new tag] ciflow/trunk/161667 -> ciflow/trunk/161667 2025-09-07T06:39:17.6140695Z * [new tag] ciflow/trunk/161670 -> ciflow/trunk/161670 2025-09-07T06:39:17.6140755Z * [new tag] ciflow/trunk/161692 -> ciflow/trunk/161692 2025-09-07T06:39:17.6140814Z * [new tag] ciflow/trunk/161693 -> ciflow/trunk/161693 2025-09-07T06:39:17.6141913Z * [new tag] ciflow/trunk/161695 -> ciflow/trunk/161695 2025-09-07T06:39:17.6141975Z * [new tag] ciflow/trunk/161730 -> ciflow/trunk/161730 2025-09-07T06:39:17.6142034Z * [new tag] ciflow/trunk/161744 -> ciflow/trunk/161744 2025-09-07T06:39:17.6142092Z * [new tag] ciflow/trunk/161749 -> ciflow/trunk/161749 2025-09-07T06:39:17.6142152Z * [new tag] ciflow/trunk/161881 -> ciflow/trunk/161881 2025-09-07T06:39:17.6142211Z * [new tag] ciflow/trunk/161924 -> ciflow/trunk/161924 2025-09-07T06:39:17.6142269Z * [new tag] ciflow/trunk/161926 -> ciflow/trunk/161926 2025-09-07T06:39:17.6142331Z * [new tag] ciflow/trunk/161936 -> ciflow/trunk/161936 2025-09-07T06:39:17.6142389Z * [new tag] ciflow/trunk/161952 -> ciflow/trunk/161952 2025-09-07T06:39:17.6142483Z * [new tag] ciflow/trunk/161955 -> ciflow/trunk/161955 2025-09-07T06:39:17.6142544Z * [new tag] ciflow/trunk/161957 -> ciflow/trunk/161957 2025-09-07T06:39:17.6142603Z * [new tag] ciflow/trunk/161959 -> ciflow/trunk/161959 2025-09-07T06:39:17.6142662Z * [new tag] ciflow/trunk/161977 -> ciflow/trunk/161977 2025-09-07T06:39:17.6142721Z * [new tag] ciflow/trunk/161988 -> ciflow/trunk/161988 2025-09-07T06:39:17.6142781Z * [new tag] ciflow/trunk/161994 -> ciflow/trunk/161994 2025-09-07T06:39:17.6142839Z * [new tag] ciflow/trunk/162007 -> ciflow/trunk/162007 2025-09-07T06:39:17.6142929Z * [new tag] ciflow/trunk/162013 -> ciflow/trunk/162013 2025-09-07T06:39:17.6142988Z * [new tag] ciflow/trunk/162017 -> ciflow/trunk/162017 2025-09-07T06:39:17.6143048Z * [new tag] ciflow/trunk/162021 -> ciflow/trunk/162021 2025-09-07T06:39:17.6143108Z * [new tag] ciflow/trunk/162022 -> ciflow/trunk/162022 2025-09-07T06:39:17.6143166Z * [new tag] ciflow/trunk/162040 -> ciflow/trunk/162040 2025-09-07T06:39:17.6143225Z * [new tag] ciflow/trunk/162041 -> ciflow/trunk/162041 2025-09-07T06:39:17.6143284Z * [new tag] ciflow/trunk/162062 -> ciflow/trunk/162062 2025-09-07T06:39:17.6143344Z * [new tag] ciflow/trunk/162066 -> ciflow/trunk/162066 2025-09-07T06:39:17.6143402Z * [new tag] ciflow/trunk/162089 -> ciflow/trunk/162089 2025-09-07T06:39:17.6144507Z * [new tag] ciflow/trunk/162099 -> ciflow/trunk/162099 2025-09-07T06:39:17.6144569Z * [new tag] ciflow/trunk/162104 -> ciflow/trunk/162104 2025-09-07T06:39:17.6144627Z * [new tag] ciflow/trunk/162106 -> ciflow/trunk/162106 2025-09-07T06:39:17.6144687Z * [new tag] ciflow/trunk/162112 -> ciflow/trunk/162112 2025-09-07T06:39:17.6144746Z * [new tag] ciflow/trunk/162119 -> ciflow/trunk/162119 2025-09-07T06:39:17.6144805Z * [new tag] ciflow/trunk/162142 -> ciflow/trunk/162142 2025-09-07T06:39:17.6144863Z * [new tag] ciflow/trunk/162169 -> ciflow/trunk/162169 2025-09-07T06:39:17.6144923Z * [new tag] ciflow/trunk/162183 -> ciflow/trunk/162183 2025-09-07T06:39:17.6144981Z * [new tag] ciflow/trunk/162190 -> ciflow/trunk/162190 2025-09-07T06:39:17.6145040Z * [new tag] ciflow/trunk/162194 -> ciflow/trunk/162194 2025-09-07T06:39:17.6145102Z * [new tag] ciflow/trunk/162200 -> ciflow/trunk/162200 2025-09-07T06:39:17.6145160Z * [new tag] ciflow/trunk/162206 -> ciflow/trunk/162206 2025-09-07T06:39:17.6145220Z * [new tag] ciflow/trunk/162208 -> ciflow/trunk/162208 2025-09-07T06:39:17.6145278Z * [new tag] ciflow/trunk/162222 -> ciflow/trunk/162222 2025-09-07T06:39:17.6145432Z * [new tag] ciflow/trunk/162238 -> ciflow/trunk/162238 2025-09-07T06:39:17.6145490Z * [new tag] ciflow/trunk/162244 -> ciflow/trunk/162244 2025-09-07T06:39:17.6145551Z * [new tag] ciflow/trunk/162267 -> ciflow/trunk/162267 2025-09-07T06:39:17.6145609Z * [new tag] ciflow/trunk/162269 -> ciflow/trunk/162269 2025-09-07T06:39:17.6145667Z * [new tag] ciflow/trunk/162278 -> ciflow/trunk/162278 2025-09-07T06:39:17.6145729Z * [new tag] ciflow/trunk/162286 -> ciflow/trunk/162286 2025-09-07T06:39:17.6145787Z * [new tag] ciflow/trunk/162288 -> ciflow/trunk/162288 2025-09-07T06:39:17.6145846Z * [new tag] ciflow/trunk/162293 -> ciflow/trunk/162293 2025-09-07T06:39:17.6147153Z * [new tag] ciflow/trunk/162310 -> ciflow/trunk/162310 2025-09-07T06:39:17.6147219Z * [new tag] ciflow/trunk/162311 -> ciflow/trunk/162311 2025-09-07T06:39:17.6147278Z * [new tag] ciflow/trunk/162315 -> ciflow/trunk/162315 2025-09-07T06:39:17.6147338Z * [new tag] ciflow/trunk/162325 -> ciflow/trunk/162325 2025-09-07T06:39:17.6147396Z * [new tag] ciflow/trunk/162328 -> ciflow/trunk/162328 2025-09-07T06:39:17.6147456Z * [new tag] ciflow/trunk/162329 -> ciflow/trunk/162329 2025-09-07T06:39:17.6147567Z * [new tag] ciflow/unstable/123 -> ciflow/unstable/123 2025-09-07T06:39:17.6147628Z * [new tag] ciflow/vllm/162292 -> ciflow/vllm/162292 2025-09-07T06:39:17.6147697Z * [new tag] ciflow/win-arm64/156049 -> ciflow/win-arm64/156049 2025-09-07T06:39:17.6147765Z * [new tag] ciflow/win-arm64/158104 -> ciflow/win-arm64/158104 2025-09-07T06:39:17.6147828Z * [new tag] ciflow/xpu/157699 -> ciflow/xpu/157699 2025-09-07T06:39:17.6147887Z * [new tag] ciflow/xpu/157994 -> ciflow/xpu/157994 2025-09-07T06:39:17.6147946Z * [new tag] ciflow/xpu/159459 -> ciflow/xpu/159459 2025-09-07T06:39:17.6148005Z * [new tag] ciflow/xpu/159718 -> ciflow/xpu/159718 2025-09-07T06:39:17.6148062Z * [new tag] ciflow/xpu/159944 -> ciflow/xpu/159944 2025-09-07T06:39:17.6148120Z * [new tag] ciflow/xpu/160867 -> ciflow/xpu/160867 2025-09-07T06:39:17.6148180Z * [new tag] ciflow/xpu/160938 -> ciflow/xpu/160938 2025-09-07T06:39:17.6148238Z * [new tag] ciflow/xpu/160940 -> ciflow/xpu/160940 2025-09-07T06:39:17.6148295Z * [new tag] ciflow/xpu/160953 -> ciflow/xpu/160953 2025-09-07T06:39:17.6148355Z * [new tag] ciflow/xpu/161045 -> ciflow/xpu/161045 2025-09-07T06:39:17.6148413Z * [new tag] ciflow/xpu/161058 -> ciflow/xpu/161058 2025-09-07T06:39:17.6148471Z * [new tag] ciflow/xpu/161246 -> ciflow/xpu/161246 2025-09-07T06:39:17.6148529Z * [new tag] ciflow/xpu/161397 -> ciflow/xpu/161397 2025-09-07T06:39:17.6148587Z * [new tag] ciflow/xpu/161485 -> ciflow/xpu/161485 2025-09-07T06:39:17.6148645Z * [new tag] ciflow/xpu/161988 -> ciflow/xpu/161988 2025-09-07T06:39:17.6148704Z * [new tag] ciflow/xpu/162062 -> ciflow/xpu/162062 2025-09-07T06:39:17.6148764Z * [new tag] cslpull75 -> cslpull75 2025-09-07T06:39:17.6148820Z * [new tag] cslpull76 -> cslpull76 2025-09-07T06:39:17.6149908Z * [new tag] cslpull77 -> cslpull77 2025-09-07T06:39:17.6149965Z * [new tag] cslpull78 -> cslpull78 2025-09-07T06:39:17.6150018Z * [new tag] cslpull79 -> cslpull79 2025-09-07T06:39:17.6150070Z * [new tag] cslpull80 -> cslpull80 2025-09-07T06:39:17.6150122Z * [new tag] cslpull81 -> cslpull81 2025-09-07T06:39:17.6150174Z * [new tag] cslpull82 -> cslpull82 2025-09-07T06:39:17.6150225Z * [new tag] cslpull83 -> cslpull83 2025-09-07T06:39:17.6150277Z * [new tag] cslpull84 -> cslpull84 2025-09-07T06:39:17.6150330Z * [new tag] cslpull85 -> cslpull85 2025-09-07T06:39:17.6150381Z * [new tag] cslpull86 -> cslpull86 2025-09-07T06:39:17.6150432Z * [new tag] cslpull87 -> cslpull87 2025-09-07T06:39:17.6150521Z * [new tag] cslpull88 -> cslpull88 2025-09-07T06:39:17.6150573Z * [new tag] cslpull89 -> cslpull89 2025-09-07T06:39:17.6150624Z * [new tag] cslpull90 -> cslpull90 2025-09-07T06:39:17.6150676Z * [new tag] cslpull91 -> cslpull91 2025-09-07T06:39:17.6150727Z * [new tag] cslpull92 -> cslpull92 2025-09-07T06:39:17.6150783Z * [new tag] flight_5 -> flight_5 2025-09-07T06:39:17.6150840Z * [new tag] flight_5.1 -> flight_5.1 2025-09-07T06:39:17.6150932Z * [new tag] flight_5.2 -> flight_5.2 2025-09-07T06:39:17.6150986Z * [new tag] flight_5.3 -> flight_5.3 2025-09-07T06:39:17.6151042Z * [new tag] forpull1 -> forpull1 2025-09-07T06:39:17.6151109Z * [new tag] malfet/tag-2ef5611 -> malfet/tag-2ef5611 2025-09-07T06:39:17.6151172Z * [new tag] malfet/tag-317b1a0 -> malfet/tag-317b1a0 2025-09-07T06:39:17.6151234Z * [new tag] malfet/tag-ec6f767 -> malfet/tag-ec6f767 2025-09-07T06:39:17.6152312Z * [new tag] nightly-binary -> nightly-binary 2025-09-07T06:39:17.6152381Z * [new tag] sqzhang_flight4_plus -> sqzhang_flight4_plus 2025-09-07T06:39:17.6152443Z * [new tag] sqzhang_flight_3 -> sqzhang_flight_3 2025-09-07T06:39:17.6152575Z * [new tag] trunk/00636e0171e7e733628c408084805442270cf608 -> trunk/00636e0171e7e733628c408084805442270cf608 2025-09-07T06:39:17.6152710Z * [new tag] trunk/019fed39aa6b2dd8c69347378d53423e5efae8d4 -> trunk/019fed39aa6b2dd8c69347378d53423e5efae8d4 2025-09-07T06:39:17.6152843Z * [new tag] trunk/01ab325cc2e0dc221af4d710974e1b9175066544 -> trunk/01ab325cc2e0dc221af4d710974e1b9175066544 2025-09-07T06:39:17.6152978Z * [new tag] trunk/01edcd4df8bf0c7b4cc2d3ec868bd2059eeea83b -> trunk/01edcd4df8bf0c7b4cc2d3ec868bd2059eeea83b 2025-09-07T06:39:17.6153107Z * [new tag] trunk/040d00af048967dde7938d358d7f5988cbd18388 -> trunk/040d00af048967dde7938d358d7f5988cbd18388 2025-09-07T06:39:17.6153239Z * [new tag] trunk/0447f2d99b4351b2ff129dce6eebb371024f73e5 -> trunk/0447f2d99b4351b2ff129dce6eebb371024f73e5 2025-09-07T06:39:17.6153364Z * [new tag] trunk/047603d35bdc70046216384838d6340feab79bf4 -> trunk/047603d35bdc70046216384838d6340feab79bf4 2025-09-07T06:39:17.6153498Z * [new tag] trunk/06da7c0730b3764f178ec3a90dedf4ffa4202d81 -> trunk/06da7c0730b3764f178ec3a90dedf4ffa4202d81 2025-09-07T06:39:17.6153626Z * [new tag] trunk/081cab045472ce045634548cc6c14a4870641e23 -> trunk/081cab045472ce045634548cc6c14a4870641e23 2025-09-07T06:39:17.6153757Z * [new tag] trunk/09587daf8c9f21f5340f73921ce5f23d1a4a4572 -> trunk/09587daf8c9f21f5340f73921ce5f23d1a4a4572 2025-09-07T06:39:17.6153885Z * [new tag] trunk/09be1890d72cc34fc946965dc4a27736bf0ca8c6 -> trunk/09be1890d72cc34fc946965dc4a27736bf0ca8c6 2025-09-07T06:39:17.6154013Z * [new tag] trunk/09d2f1b6315d6d416fbf452793d65795863ebc66 -> trunk/09d2f1b6315d6d416fbf452793d65795863ebc66 2025-09-07T06:39:17.6154145Z * [new tag] trunk/0af70e2353e1dcda83175fd4834ecb7b63e009e0 -> trunk/0af70e2353e1dcda83175fd4834ecb7b63e009e0 2025-09-07T06:39:17.6154274Z * [new tag] trunk/0c0e056a9e20c17271a6144dd32c0c7e3ba26736 -> trunk/0c0e056a9e20c17271a6144dd32c0c7e3ba26736 2025-09-07T06:39:17.6154408Z * [new tag] trunk/0cd6c56bdfa9178ff61be82ce3b178926ddb64a9 -> trunk/0cd6c56bdfa9178ff61be82ce3b178926ddb64a9 2025-09-07T06:39:17.6154580Z * [new tag] trunk/0d421ace32c1605ee8e452ee1eeb03bd243dd96c -> trunk/0d421ace32c1605ee8e452ee1eeb03bd243dd96c 2025-09-07T06:39:17.6154716Z * [new tag] trunk/0d71a9dd5b4b6d1dde58d91c9b71d96bc6a6a171 -> trunk/0d71a9dd5b4b6d1dde58d91c9b71d96bc6a6a171 2025-09-07T06:39:17.6154844Z * [new tag] trunk/0d84ff3b78f55492d3d4708458c92d776274939e -> trunk/0d84ff3b78f55492d3d4708458c92d776274939e 2025-09-07T06:39:17.6154976Z * [new tag] trunk/0f45aaf4414048b17d720d0915ce221a8de8ec63 -> trunk/0f45aaf4414048b17d720d0915ce221a8de8ec63 2025-09-07T06:39:17.6155111Z * [new tag] trunk/0ff8eabf1387de5acd6712a03bda61f1a3dfa27f -> trunk/0ff8eabf1387de5acd6712a03bda61f1a3dfa27f 2025-09-07T06:39:17.6155239Z * [new tag] trunk/104f2680e03d13a4765ca69f905d8f16fc0c822f -> trunk/104f2680e03d13a4765ca69f905d8f16fc0c822f 2025-09-07T06:39:17.6156424Z * [new tag] trunk/12814701555d3e41dfcdf8f9273af5821e322df0 -> trunk/12814701555d3e41dfcdf8f9273af5821e322df0 2025-09-07T06:39:17.6156626Z * [new tag] trunk/13b65196db422bdb394cb482e208c61ed448898c -> trunk/13b65196db422bdb394cb482e208c61ed448898c 2025-09-07T06:39:17.6156761Z * [new tag] trunk/13d66e2a66eceed14b8a8f5a971087df4f688a46 -> trunk/13d66e2a66eceed14b8a8f5a971087df4f688a46 2025-09-07T06:39:17.6156892Z * [new tag] trunk/145a3a7bda15e3963a33eb1b54bba5d4a270b225 -> trunk/145a3a7bda15e3963a33eb1b54bba5d4a270b225 2025-09-07T06:39:17.6157019Z * [new tag] trunk/146371483318e17929daefd37c8e459d9d6d47bb -> trunk/146371483318e17929daefd37c8e459d9d6d47bb 2025-09-07T06:39:17.6157150Z * [new tag] trunk/15c77a8cfd341e74fd124b077492ef2bfa51b339 -> trunk/15c77a8cfd341e74fd124b077492ef2bfa51b339 2025-09-07T06:39:17.6157283Z * [new tag] trunk/17fa8eec4a1e32939ab4d364ee6e75487a79b654 -> trunk/17fa8eec4a1e32939ab4d364ee6e75487a79b654 2025-09-07T06:39:17.6157412Z * [new tag] trunk/190c391a28845a14df26abb228d26aa813efb20c -> trunk/190c391a28845a14df26abb228d26aa813efb20c 2025-09-07T06:39:17.6157547Z * [new tag] trunk/1a588ace4667bde1331fbd8ed957157dca5cee68 -> trunk/1a588ace4667bde1331fbd8ed957157dca5cee68 2025-09-07T06:39:17.6157679Z * [new tag] trunk/1aa7476885e8f6e7b0ec3a5b6383aad9d3f343e7 -> trunk/1aa7476885e8f6e7b0ec3a5b6383aad9d3f343e7 2025-09-07T06:39:17.6157807Z * [new tag] trunk/1aeb421c342c9e9607842f4c87cb46e8e816ee53 -> trunk/1aeb421c342c9e9607842f4c87cb46e8e816ee53 2025-09-07T06:39:17.6157939Z * [new tag] trunk/1c1b28d5b6a942fafe23b2f09302d93c25226d4a -> trunk/1c1b28d5b6a942fafe23b2f09302d93c25226d4a 2025-09-07T06:39:17.6158167Z * [new tag] trunk/1ebd70d0c0d562d3be9abdee2a21906584af7d99 -> trunk/1ebd70d0c0d562d3be9abdee2a21906584af7d99 2025-09-07T06:39:17.6158302Z * [new tag] trunk/1ec2c15914da4ef7bd926ed9aebc8671c75fe965 -> trunk/1ec2c15914da4ef7bd926ed9aebc8671c75fe965 2025-09-07T06:39:17.6158432Z * [new tag] trunk/1f51056bd64e73d1aa81321bc3c098575b1bc78a -> trunk/1f51056bd64e73d1aa81321bc3c098575b1bc78a 2025-09-07T06:39:17.6158559Z * [new tag] trunk/1f820de639c75a1562d3fb03f160439f853ae07b -> trunk/1f820de639c75a1562d3fb03f160439f853ae07b 2025-09-07T06:39:17.6158687Z * [new tag] trunk/204697f0e695d82894c5010fbec664c4391f90cc -> trunk/204697f0e695d82894c5010fbec664c4391f90cc 2025-09-07T06:39:17.6158814Z * [new tag] trunk/20629b1619fe636227d01fc85ba221daa7185a05 -> trunk/20629b1619fe636227d01fc85ba221daa7185a05 2025-09-07T06:39:17.6158944Z * [new tag] trunk/20b47acef845e9c4f71da9429a396d293f50ebe7 -> trunk/20b47acef845e9c4f71da9429a396d293f50ebe7 2025-09-07T06:39:17.6159076Z * [new tag] trunk/20bfb2539d7c5250379648eda35f80b8a7d642dd -> trunk/20bfb2539d7c5250379648eda35f80b8a7d642dd 2025-09-07T06:39:17.6159208Z * [new tag] trunk/21fae99c180d17def562797ea0fb154d8fdf88e3 -> trunk/21fae99c180d17def562797ea0fb154d8fdf88e3 2025-09-07T06:39:17.6159393Z * [new tag] trunk/248355faf53f9f7ba2fd0a367d59600c6d991e7f -> trunk/248355faf53f9f7ba2fd0a367d59600c6d991e7f 2025-09-07T06:39:17.6159524Z * [new tag] trunk/25f4aaed9ec26f39c13862323ff8582006473d23 -> trunk/25f4aaed9ec26f39c13862323ff8582006473d23 2025-09-07T06:39:17.6159650Z * [new tag] trunk/261a84a1764412f8e659c956e3f81997ec3de9d5 -> trunk/261a84a1764412f8e659c956e3f81997ec3de9d5 2025-09-07T06:39:17.6160821Z * [new tag] trunk/28f4ab0737937858730f29f5c4e601e109cf9d5f -> trunk/28f4ab0737937858730f29f5c4e601e109cf9d5f 2025-09-07T06:39:17.6160957Z * [new tag] trunk/291cd11f2d5df6f48d348cce0e4e762f274f4dc4 -> trunk/291cd11f2d5df6f48d348cce0e4e762f274f4dc4 2025-09-07T06:39:17.6161140Z * [new tag] trunk/29280864d941e6108ab57f7298f520c0cf9696e9 -> trunk/29280864d941e6108ab57f7298f520c0cf9696e9 2025-09-07T06:39:17.6161269Z * [new tag] trunk/2a45837e98c63cae9d1a2e2133a727b829e549d5 -> trunk/2a45837e98c63cae9d1a2e2133a727b829e549d5 2025-09-07T06:39:17.6161403Z * [new tag] trunk/2a5c0785e2f975697fd7bdf1411de6e03dcaa1ef -> trunk/2a5c0785e2f975697fd7bdf1411de6e03dcaa1ef 2025-09-07T06:39:17.6161531Z * [new tag] trunk/2b8a83901c58a0858ea9e4ce00055f48e6ed164c -> trunk/2b8a83901c58a0858ea9e4ce00055f48e6ed164c 2025-09-07T06:39:17.6161658Z * [new tag] trunk/2ba65472dd54488a86a50326ea990195fc6732d6 -> trunk/2ba65472dd54488a86a50326ea990195fc6732d6 2025-09-07T06:39:17.6161790Z * [new tag] trunk/2c03f0acc53ed13fe8ebfe809129f25996e009a0 -> trunk/2c03f0acc53ed13fe8ebfe809129f25996e009a0 2025-09-07T06:39:17.6161917Z * [new tag] trunk/2dd529df0092799f68ee7afcf52338276906706a -> trunk/2dd529df0092799f68ee7afcf52338276906706a 2025-09-07T06:39:17.6162050Z * [new tag] trunk/2f6b4b1ad3f82bb3bd984f6e65744ea339ffb8b5 -> trunk/2f6b4b1ad3f82bb3bd984f6e65744ea339ffb8b5 2025-09-07T06:39:17.6162181Z * [new tag] trunk/2fa0520a64ed8aa734a56c4d124958f0b5711ca8 -> trunk/2fa0520a64ed8aa734a56c4d124958f0b5711ca8 2025-09-07T06:39:17.6162311Z * [new tag] trunk/302df2ac5dc4222294c09d48804a2dddb8f4bad8 -> trunk/302df2ac5dc4222294c09d48804a2dddb8f4bad8 2025-09-07T06:39:17.6162440Z * [new tag] trunk/33028597bfa2e0178e28c8cce33cb9b3800cac43 -> trunk/33028597bfa2e0178e28c8cce33cb9b3800cac43 2025-09-07T06:39:17.6162565Z * [new tag] trunk/34aa78274d6770086025a967fa63a86830e08176 -> trunk/34aa78274d6770086025a967fa63a86830e08176 2025-09-07T06:39:17.6162694Z * [new tag] trunk/3559c354ce6a14d11fe29fb12fa2747a2f2af449 -> trunk/3559c354ce6a14d11fe29fb12fa2747a2f2af449 2025-09-07T06:39:17.6162827Z * [new tag] trunk/36d207fcaaede0d1e58a5168084c307b32b6fd8b -> trunk/36d207fcaaede0d1e58a5168084c307b32b6fd8b 2025-09-07T06:39:17.6162957Z * [new tag] trunk/377033757ae5ca524ea842f1b0a5f446ed3d8fe0 -> trunk/377033757ae5ca524ea842f1b0a5f446ed3d8fe0 2025-09-07T06:39:17.6163087Z * [new tag] trunk/3771380f83fcac154a7c89ad679311d8c4818287 -> trunk/3771380f83fcac154a7c89ad679311d8c4818287 2025-09-07T06:39:17.6163214Z * [new tag] trunk/3a207816cc569f78863d86c01f2a3d265350e39f -> trunk/3a207816cc569f78863d86c01f2a3d265350e39f 2025-09-07T06:39:17.6163346Z * [new tag] trunk/3a20a20e7065ec927fdd216d4da3b04f879b3c67 -> trunk/3a20a20e7065ec927fdd216d4da3b04f879b3c67 2025-09-07T06:39:17.6163479Z * [new tag] trunk/3bbc2e3e4f025523eaa5dbff220b3e96bca608d0 -> trunk/3bbc2e3e4f025523eaa5dbff220b3e96bca608d0 2025-09-07T06:39:17.6163610Z * [new tag] trunk/3c0ff1b569c45cfa6935ad8031a9d4cf1551aa3f -> trunk/3c0ff1b569c45cfa6935ad8031a9d4cf1551aa3f 2025-09-07T06:39:17.6163745Z * [new tag] trunk/3c45af079afc92a03b03ddf4f9198902ffcf30cf -> trunk/3c45af079afc92a03b03ddf4f9198902ffcf30cf 2025-09-07T06:39:17.6163914Z * [new tag] trunk/3dde5d7f9bf80dd6623a712bc429e9e4302464b5 -> trunk/3dde5d7f9bf80dd6623a712bc429e9e4302464b5 2025-09-07T06:39:17.6165071Z * [new tag] trunk/403a3a393cda7e60f503f3b04b8805a845dcf45d -> trunk/403a3a393cda7e60f503f3b04b8805a845dcf45d 2025-09-07T06:39:17.6165202Z * [new tag] trunk/420c52ecf36f86d32da0853bfbe074b682b070aa -> trunk/420c52ecf36f86d32da0853bfbe074b682b070aa 2025-09-07T06:39:17.6165331Z * [new tag] trunk/43b7c86a2c0f91320f5c5f4827b111edff06fdb6 -> trunk/43b7c86a2c0f91320f5c5f4827b111edff06fdb6 2025-09-07T06:39:17.6165458Z * [new tag] trunk/451ed931562ec8b46d1f7e6c266a68132a119336 -> trunk/451ed931562ec8b46d1f7e6c266a68132a119336 2025-09-07T06:39:17.6165622Z * [new tag] trunk/480c7391126656154318fabf1d57ebc01e196e63 -> trunk/480c7391126656154318fabf1d57ebc01e196e63 2025-09-07T06:39:17.6165753Z * [new tag] trunk/48bedd753da22634aa94fbafeb731e82025404f3 -> trunk/48bedd753da22634aa94fbafeb731e82025404f3 2025-09-07T06:39:17.6165881Z * [new tag] trunk/494878a11b79071ada0b98f34042d47155be6d1c -> trunk/494878a11b79071ada0b98f34042d47155be6d1c 2025-09-07T06:39:17.6166013Z * [new tag] trunk/4ae57d448c0a7d37e4cfd5c27d977fad2cef4051 -> trunk/4ae57d448c0a7d37e4cfd5c27d977fad2cef4051 2025-09-07T06:39:17.6166142Z * [new tag] trunk/4cdaf8265d86f984254b62052da8c26ef61ef1cf -> trunk/4cdaf8265d86f984254b62052da8c26ef61ef1cf 2025-09-07T06:39:17.6166281Z * [new tag] trunk/4d4abec80f03cd8fdefe1d9cb3a60d3690cd777e -> trunk/4d4abec80f03cd8fdefe1d9cb3a60d3690cd777e 2025-09-07T06:39:17.6166417Z * [new tag] trunk/4e42aa8ffc44b8340eb0eeaf80a2cafc4763a186 -> trunk/4e42aa8ffc44b8340eb0eeaf80a2cafc4763a186 2025-09-07T06:39:17.6166627Z * [new tag] trunk/4f72d932feee0749397fec876dcd43994f50b215 -> trunk/4f72d932feee0749397fec876dcd43994f50b215 2025-09-07T06:39:17.6166759Z * [new tag] trunk/50fc22dedf3c4a27be61fa05551c4f320281b42d -> trunk/50fc22dedf3c4a27be61fa05551c4f320281b42d 2025-09-07T06:39:17.6166888Z * [new tag] trunk/5211f1f908907ffc064b56e43cf8659f7fc22aa9 -> trunk/5211f1f908907ffc064b56e43cf8659f7fc22aa9 2025-09-07T06:39:17.6167017Z * [new tag] trunk/524b78d4f67045b83bb69edc56ab16efe282971c -> trunk/524b78d4f67045b83bb69edc56ab16efe282971c 2025-09-07T06:39:17.6167153Z * [new tag] trunk/54e275e0d81fe1e1ccfa4fb5f2a5a9aaca00ca15 -> trunk/54e275e0d81fe1e1ccfa4fb5f2a5a9aaca00ca15 2025-09-07T06:39:17.6167277Z * [new tag] trunk/5561e45758d59c94605873d5db48ed459c004c3b -> trunk/5561e45758d59c94605873d5db48ed459c004c3b 2025-09-07T06:39:17.6167405Z * [new tag] trunk/57278d45f046d4f89f45d373b1af4dd56934ff24 -> trunk/57278d45f046d4f89f45d373b1af4dd56934ff24 2025-09-07T06:39:17.6167534Z * [new tag] trunk/5927a70934ccf7b70182d364c23245a7dd685503 -> trunk/5927a70934ccf7b70182d364c23245a7dd685503 2025-09-07T06:39:17.6167666Z * [new tag] trunk/5985e28912aeb40b103ebfcf2fd0665eb4a50599 -> trunk/5985e28912aeb40b103ebfcf2fd0665eb4a50599 2025-09-07T06:39:17.6167799Z * [new tag] trunk/5a2da090ed6db88bb657c4e51ec0b310cd08bff6 -> trunk/5a2da090ed6db88bb657c4e51ec0b310cd08bff6 2025-09-07T06:39:17.6167933Z * [new tag] trunk/5c473e9f5ee0ef0fc38e6cf34a95b547f8cdc8d5 -> trunk/5c473e9f5ee0ef0fc38e6cf34a95b547f8cdc8d5 2025-09-07T06:39:17.6168060Z * [new tag] trunk/5c67426d6847667a7c55a2dd01f470fa37238c18 -> trunk/5c67426d6847667a7c55a2dd01f470fa37238c18 2025-09-07T06:39:17.6168188Z * [new tag] trunk/5da573c42c332bc68d4b7946c69f690a876d951a -> trunk/5da573c42c332bc68d4b7946c69f690a876d951a 2025-09-07T06:39:17.6169348Z * [new tag] trunk/5e5870e858f60ff4bf87d03f3592097e934a9580 -> trunk/5e5870e858f60ff4bf87d03f3592097e934a9580 2025-09-07T06:39:17.6169483Z * [new tag] trunk/5f3cbc9442aa55b5afb29f4ac8ca9be569003e84 -> trunk/5f3cbc9442aa55b5afb29f4ac8ca9be569003e84 2025-09-07T06:39:17.6169692Z * [new tag] trunk/600c25e9a17fe56e3dee872be8854db08916ba0c -> trunk/600c25e9a17fe56e3dee872be8854db08916ba0c 2025-09-07T06:39:17.6169826Z * [new tag] trunk/601ae8e4831fc8123fffcfb8fd2e6b6381b42e14 -> trunk/601ae8e4831fc8123fffcfb8fd2e6b6381b42e14 2025-09-07T06:39:17.6169953Z * [new tag] trunk/6087ef41e54c2494b117ffd923faf20f515a6806 -> trunk/6087ef41e54c2494b117ffd923faf20f515a6806 2025-09-07T06:39:17.6170085Z * [new tag] trunk/626cb7df8161dd4ecb4fe43b60f37ce9076f56b1 -> trunk/626cb7df8161dd4ecb4fe43b60f37ce9076f56b1 2025-09-07T06:39:17.6170215Z * [new tag] trunk/62c3f9a97fd3dea7132a93066d32d893ffe101e6 -> trunk/62c3f9a97fd3dea7132a93066d32d893ffe101e6 2025-09-07T06:39:17.6170403Z * [new tag] trunk/63a9c23fe99eacfd09610c36dfe8f01b053c1a35 -> trunk/63a9c23fe99eacfd09610c36dfe8f01b053c1a35 2025-09-07T06:39:17.6170529Z * [new tag] trunk/65985937d97505f648b6ed852c3129f2dd08b251 -> trunk/65985937d97505f648b6ed852c3129f2dd08b251 2025-09-07T06:39:17.6170657Z * [new tag] trunk/66f3b4a682a6153517dd23369fdc3289b6494b07 -> trunk/66f3b4a682a6153517dd23369fdc3289b6494b07 2025-09-07T06:39:17.6170783Z * [new tag] trunk/6737e2c996990024187ba620d2764f3b6f6add2c -> trunk/6737e2c996990024187ba620d2764f3b6f6add2c 2025-09-07T06:39:17.6170912Z * [new tag] trunk/67c31dcd364f10072a55f4a30ffd1151c686283a -> trunk/67c31dcd364f10072a55f4a30ffd1151c686283a 2025-09-07T06:39:17.6171042Z * [new tag] trunk/68738beff73e9c3512e18b4edea811a897ce42db -> trunk/68738beff73e9c3512e18b4edea811a897ce42db 2025-09-07T06:39:17.6171169Z * [new tag] trunk/69a25f68884a168550695fdb1a7c310c54d29536 -> trunk/69a25f68884a168550695fdb1a7c310c54d29536 2025-09-07T06:39:17.6171295Z * [new tag] trunk/6b1900c22f1a07b9519346898d4c71d8a2b0f12f -> trunk/6b1900c22f1a07b9519346898d4c71d8a2b0f12f 2025-09-07T06:39:17.6171426Z * [new tag] trunk/6b8b3ac4403f771bd4a8f9a45d93347304148774 -> trunk/6b8b3ac4403f771bd4a8f9a45d93347304148774 2025-09-07T06:39:17.6171553Z * [new tag] trunk/6f7608d603834d6068b2e7a5d59bec3973b6bb1b -> trunk/6f7608d603834d6068b2e7a5d59bec3973b6bb1b 2025-09-07T06:39:17.6171681Z * [new tag] trunk/70d36e047dfb3488fd6335016711a784d810ebda -> trunk/70d36e047dfb3488fd6335016711a784d810ebda 2025-09-07T06:39:17.6171810Z * [new tag] trunk/71992dd805ff9d6763f77214dfe8b0465e88c87b -> trunk/71992dd805ff9d6763f77214dfe8b0465e88c87b 2025-09-07T06:39:17.6171941Z * [new tag] trunk/734ce8eba9c69381f187359bf0fef1d71d84cd20 -> trunk/734ce8eba9c69381f187359bf0fef1d71d84cd20 2025-09-07T06:39:17.6172071Z * [new tag] trunk/73eb4511fb863a37944342b7e92aae706de603c8 -> trunk/73eb4511fb863a37944342b7e92aae706de603c8 2025-09-07T06:39:17.6172202Z * [new tag] trunk/75bc23cfc345bd4c05e7f97c416c4b3d2d1fa64b -> trunk/75bc23cfc345bd4c05e7f97c416c4b3d2d1fa64b 2025-09-07T06:39:17.6172328Z * [new tag] trunk/771f369448321a387f2018535bc8b8b6e5f12fab -> trunk/771f369448321a387f2018535bc8b8b6e5f12fab 2025-09-07T06:39:17.6173477Z * [new tag] trunk/789d4942127143f2adcb53612c058ce4c9a2cf20 -> trunk/789d4942127143f2adcb53612c058ce4c9a2cf20 2025-09-07T06:39:17.6173608Z * [new tag] trunk/791eff96c85678c950888f9da24650083ee673fe -> trunk/791eff96c85678c950888f9da24650083ee673fe 2025-09-07T06:39:17.6173740Z * [new tag] trunk/793fc12aff1f69fbbf9f4278182fb52bbe350fc9 -> trunk/793fc12aff1f69fbbf9f4278182fb52bbe350fc9 2025-09-07T06:39:17.6173872Z * [new tag] trunk/79fcd5247a9a129eee526a14df30bfc6a22b3f01 -> trunk/79fcd5247a9a129eee526a14df30bfc6a22b3f01 2025-09-07T06:39:17.6174001Z * [new tag] trunk/7f4ff79210eb06924f223ae3a1941ee0e2635348 -> trunk/7f4ff79210eb06924f223ae3a1941ee0e2635348 2025-09-07T06:39:17.6174168Z * [new tag] trunk/8076a185c85112be62be292eb47409c88a585b1c -> trunk/8076a185c85112be62be292eb47409c88a585b1c 2025-09-07T06:39:17.6174295Z * [new tag] trunk/80dd397f1979371a5583fa3d5c7352029522a78d -> trunk/80dd397f1979371a5583fa3d5c7352029522a78d 2025-09-07T06:39:17.6174418Z * [new tag] trunk/8171d6052ec12628eb67e0040839314056014429 -> trunk/8171d6052ec12628eb67e0040839314056014429 2025-09-07T06:39:17.6174549Z * [new tag] trunk/81aeefa657b7ccc26b275c50a9f33b2f056e8071 -> trunk/81aeefa657b7ccc26b275c50a9f33b2f056e8071 2025-09-07T06:39:17.6174676Z * [new tag] trunk/81b7b16618bda250ce55982894a83dc0805eb64c -> trunk/81b7b16618bda250ce55982894a83dc0805eb64c 2025-09-07T06:39:17.6174835Z * [new tag] trunk/827f0d405448de31f79d1089f7d7fceab2f87895 -> trunk/827f0d405448de31f79d1089f7d7fceab2f87895 2025-09-07T06:39:17.6174964Z * [new tag] trunk/82f63c8f6de63c30132a8ac299b6e8c2fd0d3fe8 -> trunk/82f63c8f6de63c30132a8ac299b6e8c2fd0d3fe8 2025-09-07T06:39:17.6175095Z * [new tag] trunk/850e1382a9c56bfde18af09d3e72352d775e9435 -> trunk/850e1382a9c56bfde18af09d3e72352d775e9435 2025-09-07T06:39:17.6175222Z * [new tag] trunk/8678d831c48e616b717bff50f2d03141d2e9f965 -> trunk/8678d831c48e616b717bff50f2d03141d2e9f965 2025-09-07T06:39:17.6175352Z * [new tag] trunk/869cbcc16e489a4f5a14a93d5779b0ea86061c60 -> trunk/869cbcc16e489a4f5a14a93d5779b0ea86061c60 2025-09-07T06:39:17.6175483Z * [new tag] trunk/8703debf669bc2238211bfd039f4ecdd8228b7f7 -> trunk/8703debf669bc2238211bfd039f4ecdd8228b7f7 2025-09-07T06:39:17.6175616Z * [new tag] trunk/874069fbe46e82da5cfa405e6c0deb12e89ff608 -> trunk/874069fbe46e82da5cfa405e6c0deb12e89ff608 2025-09-07T06:39:17.6175747Z * [new tag] trunk/8875d6e394da2fffd04f31b28bf258c94d4776a3 -> trunk/8875d6e394da2fffd04f31b28bf258c94d4776a3 2025-09-07T06:39:17.6175878Z * [new tag] trunk/88d94d17e8c5155451393afa6eb3bab48ab61c16 -> trunk/88d94d17e8c5155451393afa6eb3bab48ab61c16 2025-09-07T06:39:17.6176005Z * [new tag] trunk/890626632def7e0ef95a2d01e87a0e4627824a9f -> trunk/890626632def7e0ef95a2d01e87a0e4627824a9f 2025-09-07T06:39:17.6176135Z * [new tag] trunk/8975cda2520b7b1b5bc3b4d8213edf261fa82570 -> trunk/8975cda2520b7b1b5bc3b4d8213edf261fa82570 2025-09-07T06:39:17.6176263Z * [new tag] trunk/89d41d3f61d04f14730ec26f008a59bef6624610 -> trunk/89d41d3f61d04f14730ec26f008a59bef6624610 2025-09-07T06:39:17.6176393Z * [new tag] trunk/8bb213b6d599ef1273fe52f9b1f6d476056c3a41 -> trunk/8bb213b6d599ef1273fe52f9b1f6d476056c3a41 2025-09-07T06:39:17.6176592Z * [new tag] trunk/8e23a1227b5fb2e39afaa7d57c075a75b640a5af -> trunk/8e23a1227b5fb2e39afaa7d57c075a75b640a5af 2025-09-07T06:39:17.6177762Z * [new tag] trunk/8ec551bb354ab2b85fbbba9d461740a20366d248 -> trunk/8ec551bb354ab2b85fbbba9d461740a20366d248 2025-09-07T06:39:17.6177898Z * [new tag] trunk/8fd3c9ce919c8d5c645fd348bba517e948cbc29d -> trunk/8fd3c9ce919c8d5c645fd348bba517e948cbc29d 2025-09-07T06:39:17.6178026Z * [new tag] trunk/90f50f7e68e120d9574e6e3189e37b4280010ad9 -> trunk/90f50f7e68e120d9574e6e3189e37b4280010ad9 2025-09-07T06:39:17.6178156Z * [new tag] trunk/91f0bcf43fc0bc743350d491ac63b77e92054ac9 -> trunk/91f0bcf43fc0bc743350d491ac63b77e92054ac9 2025-09-07T06:39:17.6178284Z * [new tag] trunk/92576a594b8121f6b0b1b5a3ea16d08792fc68ab -> trunk/92576a594b8121f6b0b1b5a3ea16d08792fc68ab 2025-09-07T06:39:17.6178415Z * [new tag] trunk/92a43025e0baa1f2ce345f28d22913b518a1ab9d -> trunk/92a43025e0baa1f2ce345f28d22913b518a1ab9d 2025-09-07T06:39:17.6178546Z * [new tag] trunk/93fb23d6fae7c4e82c4239a1033e522088742634 -> trunk/93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:39:17.6178676Z * [new tag] trunk/9458d1ac3bd70c2af316a8ba95d2c6c9c1199c9c -> trunk/9458d1ac3bd70c2af316a8ba95d2c6c9c1199c9c 2025-09-07T06:39:17.6178873Z * [new tag] trunk/9480cdc0b61488c89a23c2f64f43b2dcedc8728e -> trunk/9480cdc0b61488c89a23c2f64f43b2dcedc8728e 2025-09-07T06:39:17.6179002Z * [new tag] trunk/9491d289b329e4ba4a9f5f5b1be7960671bb7840 -> trunk/9491d289b329e4ba4a9f5f5b1be7960671bb7840 2025-09-07T06:39:17.6179127Z * [new tag] trunk/9499c8761cd2067feb9877414e818f6fd00290f1 -> trunk/9499c8761cd2067feb9877414e818f6fd00290f1 2025-09-07T06:39:17.6179259Z * [new tag] trunk/95ee0bfea99d3d346d6502b91b497d2b35795504 -> trunk/95ee0bfea99d3d346d6502b91b497d2b35795504 2025-09-07T06:39:17.6179387Z * [new tag] trunk/98374612fc2febd686be20761e56bdc2424bc36a -> trunk/98374612fc2febd686be20761e56bdc2424bc36a 2025-09-07T06:39:17.6179573Z * [new tag] trunk/98efc9e93d8fc61eb53cb91378443617cb550500 -> trunk/98efc9e93d8fc61eb53cb91378443617cb550500 2025-09-07T06:39:17.6179710Z * [new tag] trunk/994f2a5dbcbdc915da39bf6f6ce4d1f5e74835c9 -> trunk/994f2a5dbcbdc915da39bf6f6ce4d1f5e74835c9 2025-09-07T06:39:17.6179838Z * [new tag] trunk/99f356fa58c8d726cef022d8710f5491291158f6 -> trunk/99f356fa58c8d726cef022d8710f5491291158f6 2025-09-07T06:39:17.6179969Z * [new tag] trunk/9a1c5c0a078b94d13ac5c1ae0d754d19fb73bf99 -> trunk/9a1c5c0a078b94d13ac5c1ae0d754d19fb73bf99 2025-09-07T06:39:17.6180101Z * [new tag] trunk/9a665ca3c472384e9d722bddba79e5a7680f1abd -> trunk/9a665ca3c472384e9d722bddba79e5a7680f1abd 2025-09-07T06:39:17.6180229Z * [new tag] trunk/9aedb3cd87b52160872173c177f61053d97bed57 -> trunk/9aedb3cd87b52160872173c177f61053d97bed57 2025-09-07T06:39:17.6180357Z * [new tag] trunk/9b81fe281da41f2421506339d26b027a468902f4 -> trunk/9b81fe281da41f2421506339d26b027a468902f4 2025-09-07T06:39:17.6180492Z * [new tag] trunk/9bdcee01f86e2969cff1140cdecfca13cb51816e -> trunk/9bdcee01f86e2969cff1140cdecfca13cb51816e 2025-09-07T06:39:17.6180623Z * [new tag] trunk/9c03d6be87eedc06e524e202e07a7e776551a839 -> trunk/9c03d6be87eedc06e524e202e07a7e776551a839 2025-09-07T06:39:17.6180752Z * [new tag] trunk/9c957723a0fedd9c637e63e023a613019e2cab60 -> trunk/9c957723a0fedd9c637e63e023a613019e2cab60 2025-09-07T06:39:17.6180880Z * [new tag] trunk/9e5247f51d81735e5f1e65e80588985fa93bccc5 -> trunk/9e5247f51d81735e5f1e65e80588985fa93bccc5 2025-09-07T06:39:17.6182032Z * [new tag] trunk/9eadb37cdd699f7e8e8177a5227bfeb16184ef26 -> trunk/9eadb37cdd699f7e8e8177a5227bfeb16184ef26 2025-09-07T06:39:17.6182166Z * [new tag] trunk/a00cdc1e4159db73c9ffb3f25e93e55877709a29 -> trunk/a00cdc1e4159db73c9ffb3f25e93e55877709a29 2025-09-07T06:39:17.6182301Z * [new tag] trunk/a02ee4a816d11380c6f564c1aba64d56af5ba705 -> trunk/a02ee4a816d11380c6f564c1aba64d56af5ba705 2025-09-07T06:39:17.6182428Z * [new tag] trunk/a3c7f77e50f900721817934120d60c2361b3c40d -> trunk/a3c7f77e50f900721817934120d60c2361b3c40d 2025-09-07T06:39:17.6182558Z * [new tag] trunk/a3d72b09ae12126a2b7d4a63a45ac100a882a802 -> trunk/a3d72b09ae12126a2b7d4a63a45ac100a882a802 2025-09-07T06:39:17.6182688Z * [new tag] trunk/a3e5466002791da609fcb069155d8ee347baee92 -> trunk/a3e5466002791da609fcb069155d8ee347baee92 2025-09-07T06:39:17.6182817Z * [new tag] trunk/a714437093ed196eee28f7de454cf4c41badc098 -> trunk/a714437093ed196eee28f7de454cf4c41badc098 2025-09-07T06:39:17.6182944Z * [new tag] trunk/a75e8cd27098f290de0b7439685d05ce02e91356 -> trunk/a75e8cd27098f290de0b7439685d05ce02e91356 2025-09-07T06:39:17.6183073Z * [new tag] trunk/a8d6943d36c1c2a5f90d3573460695bad4b623ae -> trunk/a8d6943d36c1c2a5f90d3573460695bad4b623ae 2025-09-07T06:39:17.6183208Z * [new tag] trunk/a918bbad6ab20649ff82eefb48417ecbe96bcb34 -> trunk/a918bbad6ab20649ff82eefb48417ecbe96bcb34 2025-09-07T06:39:17.6183374Z * [new tag] trunk/a99d8d39bc842d6ebc3e368b178e4884d24b056e -> trunk/a99d8d39bc842d6ebc3e368b178e4884d24b056e 2025-09-07T06:39:17.6183505Z * [new tag] trunk/aac1a50a191b4102d566c9c1ea22f06d6c2e3f02 -> trunk/aac1a50a191b4102d566c9c1ea22f06d6c2e3f02 2025-09-07T06:39:17.6183636Z * [new tag] trunk/aad96a202244c7d0d120c04ba8db593edd8c0f92 -> trunk/aad96a202244c7d0d120c04ba8db593edd8c0f92 2025-09-07T06:39:17.6183770Z * [new tag] trunk/ab643e4dbbaf7b663d4237514cbf01af9b11565c -> trunk/ab643e4dbbaf7b663d4237514cbf01af9b11565c 2025-09-07T06:39:17.6183902Z * [new tag] trunk/abc447174cd2cf8591edbc70a9f836f9a5779f47 -> trunk/abc447174cd2cf8591edbc70a9f836f9a5779f47 2025-09-07T06:39:17.6184069Z * [new tag] trunk/acece97c3a9dceb63194e314da93fdf37cf15a0d -> trunk/acece97c3a9dceb63194e314da93fdf37cf15a0d 2025-09-07T06:39:17.6184201Z * [new tag] trunk/adae7f66aacf3f248c3101b858cf98d5809119fa -> trunk/adae7f66aacf3f248c3101b858cf98d5809119fa 2025-09-07T06:39:17.6184340Z * [new tag] trunk/ae0edc133e61e3b16caf0b2ee0ff3f33ab72af4c -> trunk/ae0edc133e61e3b16caf0b2ee0ff3f33ab72af4c 2025-09-07T06:39:17.6184469Z * [new tag] trunk/aed33a8fcbd60b052d4559d261390c5797129c6d -> trunk/aed33a8fcbd60b052d4559d261390c5797129c6d 2025-09-07T06:39:17.6184596Z * [new tag] trunk/b04e922712080a3652e438d05e8bb74e0cd2d238 -> trunk/b04e922712080a3652e438d05e8bb74e0cd2d238 2025-09-07T06:39:17.6184729Z * [new tag] trunk/b0a3e58dd71c1a039ac0ef51e5bd8f704f632f6f -> trunk/b0a3e58dd71c1a039ac0ef51e5bd8f704f632f6f 2025-09-07T06:39:17.6184859Z * [new tag] trunk/b16d3f4c8c01d461c2f01064e9ca5fa2b33f5cf1 -> trunk/b16d3f4c8c01d461c2f01064e9ca5fa2b33f5cf1 2025-09-07T06:39:17.6184989Z * [new tag] trunk/b18bb6796f210a183e687d9d64984a5a9d13cf09 -> trunk/b18bb6796f210a183e687d9d64984a5a9d13cf09 2025-09-07T06:39:17.6185123Z * [new tag] trunk/b1bb98ddebdd3e41bf7987372409bdce96ae55de -> trunk/b1bb98ddebdd3e41bf7987372409bdce96ae55de 2025-09-07T06:39:17.6186274Z * [new tag] trunk/b2b4add0e754411372060e1d7b4057a66439172b -> trunk/b2b4add0e754411372060e1d7b4057a66439172b 2025-09-07T06:39:17.6186411Z * [new tag] trunk/b2c7b9ad2dc5a7c0b61febd307761bd5bc2f0f05 -> trunk/b2c7b9ad2dc5a7c0b61febd307761bd5bc2f0f05 2025-09-07T06:39:17.6186610Z * [new tag] trunk/b40d9432be44a6b5974ee62e7d19c3c61c5ece37 -> trunk/b40d9432be44a6b5974ee62e7d19c3c61c5ece37 2025-09-07T06:39:17.6186738Z * [new tag] trunk/b4ad38279b178b7bd14355123c1101e2e853e77b -> trunk/b4ad38279b178b7bd14355123c1101e2e853e77b 2025-09-07T06:39:17.6186870Z * [new tag] trunk/b67c41039835bd9b20b83cd6233e86baaa5f5dde -> trunk/b67c41039835bd9b20b83cd6233e86baaa5f5dde 2025-09-07T06:39:17.6187006Z * [new tag] trunk/b6d0a9ea9056ede4f7024dbf3bd6c43be3aff49c -> trunk/b6d0a9ea9056ede4f7024dbf3bd6c43be3aff49c 2025-09-07T06:39:17.6187138Z * [new tag] trunk/b7dad7dd49448c88d0751fa2e29c70afe985f734 -> trunk/b7dad7dd49448c88d0751fa2e29c70afe985f734 2025-09-07T06:39:17.6187271Z * [new tag] trunk/b7e207ca9f046ddd716076965a0cce403ba99052 -> trunk/b7e207ca9f046ddd716076965a0cce403ba99052 2025-09-07T06:39:17.6187399Z * [new tag] trunk/b919560c4a7010e2d89facee25586269a994746e -> trunk/b919560c4a7010e2d89facee25586269a994746e 2025-09-07T06:39:17.6187531Z * [new tag] trunk/b9ba612f7a968f7b27e121ca8f4d0a4d954f5354 -> trunk/b9ba612f7a968f7b27e121ca8f4d0a4d954f5354 2025-09-07T06:39:17.6187663Z * [new tag] trunk/ba7f546ccccb5e0b36d9070dc25f26a9647f89f8 -> trunk/ba7f546ccccb5e0b36d9070dc25f26a9647f89f8 2025-09-07T06:39:17.6187792Z * [new tag] trunk/bb950284c7e72905994bc25dd436c10e48088d85 -> trunk/bb950284c7e72905994bc25dd436c10e48088d85 2025-09-07T06:39:17.6188001Z * [new tag] trunk/bbedc71fd3267c639c38b4ec25eaa22f973d9c4d -> trunk/bbedc71fd3267c639c38b4ec25eaa22f973d9c4d 2025-09-07T06:39:17.6188137Z * [new tag] trunk/bc4db2c27fce6ff1648bdc5af31ec225d2a31f37 -> trunk/bc4db2c27fce6ff1648bdc5af31ec225d2a31f37 2025-09-07T06:39:17.6188263Z * [new tag] trunk/bc505977fb66677a09c31155c987330fbb18a865 -> trunk/bc505977fb66677a09c31155c987330fbb18a865 2025-09-07T06:39:17.6188397Z * [new tag] trunk/bd39e47feea7326afb5bbb67fcb1e69279239527 -> trunk/bd39e47feea7326afb5bbb67fcb1e69279239527 2025-09-07T06:39:17.6188530Z * [new tag] trunk/be5b03dde96638f25ffd732a4fed7e41b4cf40e1 -> trunk/be5b03dde96638f25ffd732a4fed7e41b4cf40e1 2025-09-07T06:39:17.6188663Z * [new tag] trunk/bffc7dd1f374d8408911cd22c6b3d6df39ded9b3 -> trunk/bffc7dd1f374d8408911cd22c6b3d6df39ded9b3 2025-09-07T06:39:17.6188847Z * [new tag] trunk/c024b1f5a18d5c5aee5cc2acdd4c52b24b93ffcf -> trunk/c024b1f5a18d5c5aee5cc2acdd4c52b24b93ffcf 2025-09-07T06:39:17.6188978Z * [new tag] trunk/c0983e6cc0acf71689e1851d12609e00b3f59371 -> trunk/c0983e6cc0acf71689e1851d12609e00b3f59371 2025-09-07T06:39:17.6189112Z * [new tag] trunk/c10195e723eeeedd099ed8b73eda7184ca618fad -> trunk/c10195e723eeeedd099ed8b73eda7184ca618fad 2025-09-07T06:39:17.6189243Z * [new tag] trunk/c157cf6488ade6a7ee2ce2d25b059e1335630a99 -> trunk/c157cf6488ade6a7ee2ce2d25b059e1335630a99 2025-09-07T06:39:17.6189370Z * [new tag] trunk/c2a30246172fd71d56529907ffd3c27b76b1f3a7 -> trunk/c2a30246172fd71d56529907ffd3c27b76b1f3a7 2025-09-07T06:39:17.6189494Z * [new tag] trunk/c32111149921b48bfef909293f1049e21619ed76 -> trunk/c32111149921b48bfef909293f1049e21619ed76 2025-09-07T06:39:17.6190658Z * [new tag] trunk/c37103234afc832dcad307e9016230810957c9d5 -> trunk/c37103234afc832dcad307e9016230810957c9d5 2025-09-07T06:39:17.6190792Z * [new tag] trunk/c3ceca2995cd35e1376c4b0704669bff1a81e836 -> trunk/c3ceca2995cd35e1376c4b0704669bff1a81e836 2025-09-07T06:39:17.6190927Z * [new tag] trunk/c3d54dea9febb1236d48d19e5d4876a63f2e20fd -> trunk/c3d54dea9febb1236d48d19e5d4876a63f2e20fd 2025-09-07T06:39:17.6191055Z * [new tag] trunk/c465b3d52c5687fe910d35a5c75341b77f821741 -> trunk/c465b3d52c5687fe910d35a5c75341b77f821741 2025-09-07T06:39:17.6191184Z * [new tag] trunk/c5b8a10be5e89396da916d1069ffcb7135f0372b -> trunk/c5b8a10be5e89396da916d1069ffcb7135f0372b 2025-09-07T06:39:17.6191312Z * [new tag] trunk/c7e41071a08f4045bc11ab60ec366d7357d56e30 -> trunk/c7e41071a08f4045bc11ab60ec366d7357d56e30 2025-09-07T06:39:17.6191449Z * [new tag] trunk/c98ddaca6d2e19ca37aff00c4ff0cda1e9a6ff65 -> trunk/c98ddaca6d2e19ca37aff00c4ff0cda1e9a6ff65 2025-09-07T06:39:17.6191581Z * [new tag] trunk/cb1e31362c7b53acf4ac95b9f8878064c184f03b -> trunk/cb1e31362c7b53acf4ac95b9f8878064c184f03b 2025-09-07T06:39:17.6191712Z * [new tag] trunk/cbfb005f7cce79974795b148e265f594f59477c8 -> trunk/cbfb005f7cce79974795b148e265f594f59477c8 2025-09-07T06:39:17.6191845Z * [new tag] trunk/cc5bdd12401bda835291d2f3cb297132ebdbf358 -> trunk/cc5bdd12401bda835291d2f3cb297132ebdbf358 2025-09-07T06:39:17.6191974Z * [new tag] trunk/cd529b686d54bbaa443f5b310140de48422d96c7 -> trunk/cd529b686d54bbaa443f5b310140de48422d96c7 2025-09-07T06:39:17.6192101Z * [new tag] trunk/cec0ff122815582af5302360aff03676558c5c87 -> trunk/cec0ff122815582af5302360aff03676558c5c87 2025-09-07T06:39:17.6192232Z * [new tag] trunk/d11720efdb563d02cf4f7d324311fb15a755268e -> trunk/d11720efdb563d02cf4f7d324311fb15a755268e 2025-09-07T06:39:17.6192359Z * [new tag] trunk/d1706d9128ae24d9048167e80d3fe5196d19035e -> trunk/d1706d9128ae24d9048167e80d3fe5196d19035e 2025-09-07T06:39:17.6192492Z * [new tag] trunk/d1a15abfdcaef138f2d9e93a9f46be44f30b766d -> trunk/d1a15abfdcaef138f2d9e93a9f46be44f30b766d 2025-09-07T06:39:17.6192658Z * [new tag] trunk/d232a95d4a79404ca05c1f52d37fde7339dcdf49 -> trunk/d232a95d4a79404ca05c1f52d37fde7339dcdf49 2025-09-07T06:39:17.6192788Z * [new tag] trunk/d2d4c8e9b2371c9aacfb771d9402ac7427b9778e -> trunk/d2d4c8e9b2371c9aacfb771d9402ac7427b9778e 2025-09-07T06:39:17.6192917Z * [new tag] trunk/d33840c542b387ab08ba49aa6c45aa9567fd9be7 -> trunk/d33840c542b387ab08ba49aa6c45aa9567fd9be7 2025-09-07T06:39:17.6193048Z * [new tag] trunk/d5643e8f3a648a99636bfa1f2a41d54bd3c0d0f1 -> trunk/d5643e8f3a648a99636bfa1f2a41d54bd3c0d0f1 2025-09-07T06:39:17.6193174Z * [new tag] trunk/d5b38410b5b6cf75c7a7389972777a6497926ee7 -> trunk/d5b38410b5b6cf75c7a7389972777a6497926ee7 2025-09-07T06:39:17.6193329Z * [new tag] trunk/d5e0f4202ba14632e4d14862ace096609e763462 -> trunk/d5e0f4202ba14632e4d14862ace096609e763462 2025-09-07T06:39:17.6193458Z * [new tag] trunk/d636c181f9140a7b59be10b36eae23039fc2bb72 -> trunk/d636c181f9140a7b59be10b36eae23039fc2bb72 2025-09-07T06:39:17.6193583Z * [new tag] trunk/d64718503728001a1e78168fd7f2d4ff23e57285 -> trunk/d64718503728001a1e78168fd7f2d4ff23e57285 2025-09-07T06:39:17.6193708Z * [new tag] trunk/d67c29ad22670320d676b02e394274af34e8e643 -> trunk/d67c29ad22670320d676b02e394274af34e8e643 2025-09-07T06:39:17.6194862Z * [new tag] trunk/d6b74568e2c98ce58ecc145b72ac66d4caf7ce95 -> trunk/d6b74568e2c98ce58ecc145b72ac66d4caf7ce95 2025-09-07T06:39:17.6194991Z * [new tag] trunk/d711f27845abd45007ccab6076649ebd896c2661 -> trunk/d711f27845abd45007ccab6076649ebd896c2661 2025-09-07T06:39:17.6195124Z * [new tag] trunk/d9d6dde0f42d4bcc8c97671ac50d5096c7e500ab -> trunk/d9d6dde0f42d4bcc8c97671ac50d5096c7e500ab 2025-09-07T06:39:17.6195262Z * [new tag] trunk/da4db4b33d1fdd046650cf19fdbac581a19bf2f9 -> trunk/da4db4b33d1fdd046650cf19fdbac581a19bf2f9 2025-09-07T06:39:17.6195397Z * [new tag] trunk/dac8a4b91c01c3bbc96f54e621b1ea4ffdbd29d1 -> trunk/dac8a4b91c01c3bbc96f54e621b1ea4ffdbd29d1 2025-09-07T06:39:17.6195526Z * [new tag] trunk/dbec08729fb9848bebed6048c63831b87170d061 -> trunk/dbec08729fb9848bebed6048c63831b87170d061 2025-09-07T06:39:17.6195654Z * [new tag] trunk/dcf385395d838f38c8dca25913578230dd43099a -> trunk/dcf385395d838f38c8dca25913578230dd43099a 2025-09-07T06:39:17.6195784Z * [new tag] trunk/dd2519abe83ec3c40d4797492434e41fe3b47e17 -> trunk/dd2519abe83ec3c40d4797492434e41fe3b47e17 2025-09-07T06:39:17.6195920Z * [new tag] trunk/dec72ea4b006dd0fbcaaaa106ad273d73807ab9d -> trunk/dec72ea4b006dd0fbcaaaa106ad273d73807ab9d 2025-09-07T06:39:17.6196051Z * [new tag] trunk/e0a62b266c021b910ce6dc02a6c9429210487717 -> trunk/e0a62b266c021b910ce6dc02a6c9429210487717 2025-09-07T06:39:17.6196180Z * [new tag] trunk/e19e02c84c9dcc408375e5cae3b0709c18b99228 -> trunk/e19e02c84c9dcc408375e5cae3b0709c18b99228 2025-09-07T06:39:17.6196313Z * [new tag] trunk/e304ea4e69d3a7deeb7e48c7450c214a4c953937 -> trunk/e304ea4e69d3a7deeb7e48c7450c214a4c953937 2025-09-07T06:39:17.6196444Z * [new tag] trunk/e3068cdb446adefb5a875616ba37a60235391439 -> trunk/e3068cdb446adefb5a875616ba37a60235391439 2025-09-07T06:39:17.6196650Z * [new tag] trunk/e381d4b0205d5f126c1de534f867ba776f7c3ee6 -> trunk/e381d4b0205d5f126c1de534f867ba776f7c3ee6 2025-09-07T06:39:17.6196779Z * [new tag] trunk/e4bd0ff4f8981b805df32ea5b3550621965ea4f2 -> trunk/e4bd0ff4f8981b805df32ea5b3550621965ea4f2 2025-09-07T06:39:17.6196911Z * [new tag] trunk/e532c9d4f1cdcbc1ea9628f55b9813e77847bdc7 -> trunk/e532c9d4f1cdcbc1ea9628f55b9813e77847bdc7 2025-09-07T06:39:17.6197039Z * [new tag] trunk/e92cd9415377403b6e90585e764639e2e0b5973b -> trunk/e92cd9415377403b6e90585e764639e2e0b5973b 2025-09-07T06:39:17.6197234Z * [new tag] trunk/e9481b6617b5576b099d8ca5798111592e9ad090 -> trunk/e9481b6617b5576b099d8ca5798111592e9ad090 2025-09-07T06:39:17.6197368Z * [new tag] trunk/ea1883dfd3e42defe37b11202b878bb76defa087 -> trunk/ea1883dfd3e42defe37b11202b878bb76defa087 2025-09-07T06:39:17.6197505Z * [new tag] trunk/eac3d6f04cfbbebe3d470dacd216da7d4b1f95a8 -> trunk/eac3d6f04cfbbebe3d470dacd216da7d4b1f95a8 2025-09-07T06:39:17.6197635Z * [new tag] trunk/eb18d32bda75189494d955aa001ade15f10333de -> trunk/eb18d32bda75189494d955aa001ade15f10333de 2025-09-07T06:39:17.6197768Z * [new tag] trunk/ef3be6726f7ff4b77c22db10cec5b686f9107ea9 -> trunk/ef3be6726f7ff4b77c22db10cec5b686f9107ea9 2025-09-07T06:39:17.6197955Z * [new tag] trunk/ef8aabd42422725026cb4dbf48aafa9efa226a04 -> trunk/ef8aabd42422725026cb4dbf48aafa9efa226a04 2025-09-07T06:39:17.6199202Z * [new tag] trunk/f00445b43eee57e20bb9316fa796ca23bf73373b -> trunk/f00445b43eee57e20bb9316fa796ca23bf73373b 2025-09-07T06:39:17.6199333Z * [new tag] trunk/f0c391102b754e3b145e8c59231d2df563487e37 -> trunk/f0c391102b754e3b145e8c59231d2df563487e37 2025-09-07T06:39:17.6199462Z * [new tag] trunk/f27985b7e796fb66a1b476284ba42d8cb360a751 -> trunk/f27985b7e796fb66a1b476284ba42d8cb360a751 2025-09-07T06:39:17.6199590Z * [new tag] trunk/f36f285953700f971552083a5da9d0ceacb63bbd -> trunk/f36f285953700f971552083a5da9d0ceacb63bbd 2025-09-07T06:39:17.6199723Z * [new tag] trunk/f3cebec39ebc110e1c8b06e741896585f7892dbb -> trunk/f3cebec39ebc110e1c8b06e741896585f7892dbb 2025-09-07T06:39:17.6199855Z * [new tag] trunk/f4c33cd44acac92c0b451a04da20ebe9370e5b0c -> trunk/f4c33cd44acac92c0b451a04da20ebe9370e5b0c 2025-09-07T06:39:17.6199987Z * [new tag] trunk/f612045ce105f008b2b675e2fc870163babeb2e8 -> trunk/f612045ce105f008b2b675e2fc870163babeb2e8 2025-09-07T06:39:17.6200118Z * [new tag] trunk/f8746b878dfc1e9639d42cbde832e9b9e792c86c -> trunk/f8746b878dfc1e9639d42cbde832e9b9e792c86c 2025-09-07T06:39:17.6200249Z * [new tag] trunk/f8ffa9194e26523e5f976d4a824d5cc58922727c -> trunk/f8ffa9194e26523e5f976d4a824d5cc58922727c 2025-09-07T06:39:17.6200378Z * [new tag] trunk/f981a7fa5230b98974291fdde32fe8488bc5d469 -> trunk/f981a7fa5230b98974291fdde32fe8488bc5d469 2025-09-07T06:39:17.6200512Z * [new tag] trunk/fbf3d2027daabbcb44d0af274b139be2a248a4f7 -> trunk/fbf3d2027daabbcb44d0af274b139be2a248a4f7 2025-09-07T06:39:17.6200644Z * [new tag] trunk/fca2601c9d628e1bd2d75c7318cd22c4e8c832aa -> trunk/fca2601c9d628e1bd2d75c7318cd22c4e8c832aa 2025-09-07T06:39:17.6200776Z * [new tag] trunk/fea20775ad96bdca972a1811d7d3372f368614ab -> trunk/fea20775ad96bdca972a1811d7d3372f368614ab 2025-09-07T06:39:17.6200905Z * [new tag] trunk/fefee081642f87419a21dc852f7167d4640443cd -> trunk/fefee081642f87419a21dc852f7167d4640443cd 2025-09-07T06:39:17.6200964Z * [new tag] v0.1.1 -> v0.1.1 2025-09-07T06:39:17.6201024Z * [new tag] v0.1.10 -> v0.1.10 2025-09-07T06:39:17.6201077Z * [new tag] v0.1.11 -> v0.1.11 2025-09-07T06:39:17.6201128Z * [new tag] v0.1.12 -> v0.1.12 2025-09-07T06:39:17.6201182Z * [new tag] v0.1.2 -> v0.1.2 2025-09-07T06:39:17.6201234Z * [new tag] v0.1.3 -> v0.1.3 2025-09-07T06:39:17.6201283Z * [new tag] v0.1.4 -> v0.1.4 2025-09-07T06:39:17.6201332Z * [new tag] v0.1.5 -> v0.1.5 2025-09-07T06:39:17.6201383Z * [new tag] v0.1.6 -> v0.1.6 2025-09-07T06:39:17.6201431Z * [new tag] v0.1.7 -> v0.1.7 2025-09-07T06:39:17.6202557Z * [new tag] v0.1.8 -> v0.1.8 2025-09-07T06:39:17.6202650Z * [new tag] v0.1.9 -> v0.1.9 2025-09-07T06:39:17.6202700Z * [new tag] v0.2.0 -> v0.2.0 2025-09-07T06:39:17.6202748Z * [new tag] v0.3.0 -> v0.3.0 2025-09-07T06:39:17.6202797Z * [new tag] v0.3.1 -> v0.3.1 2025-09-07T06:39:17.6202845Z * [new tag] v0.4.0 -> v0.4.0 2025-09-07T06:39:17.6202893Z * [new tag] v0.4.1 -> v0.4.1 2025-09-07T06:39:17.6202943Z * [new tag] v1.0.0 -> v1.0.0 2025-09-07T06:39:17.6202999Z * [new tag] v1.0.0a0 -> v1.0.0a0 2025-09-07T06:39:17.6203089Z * [new tag] v1.0.1 -> v1.0.1 2025-09-07T06:39:17.6203141Z * [new tag] v1.0rc0 -> v1.0rc0 2025-09-07T06:39:17.6203191Z * [new tag] v1.0rc1 -> v1.0rc1 2025-09-07T06:39:17.6203242Z * [new tag] v1.1.0 -> v1.1.0 2025-09-07T06:39:17.6203298Z * [new tag] v1.1.0a0 -> v1.1.0a0 2025-09-07T06:39:17.6203348Z * [new tag] v1.10.0 -> v1.10.0 2025-09-07T06:39:17.6203407Z * [new tag] v1.10.0-rc1 -> v1.10.0-rc1 2025-09-07T06:39:17.6203465Z * [new tag] v1.10.0-rc2 -> v1.10.0-rc2 2025-09-07T06:39:17.6203520Z * [new tag] v1.10.0-rc3 -> v1.10.0-rc3 2025-09-07T06:39:17.6203571Z * [new tag] v1.10.1 -> v1.10.1 2025-09-07T06:39:17.6203625Z * [new tag] v1.10.1-rc1 -> v1.10.1-rc1 2025-09-07T06:39:17.6203677Z * [new tag] v1.10.2 -> v1.10.2 2025-09-07T06:39:17.6203730Z * [new tag] v1.10.2-rc1 -> v1.10.2-rc1 2025-09-07T06:39:17.6203782Z * [new tag] v1.11.0 -> v1.11.0 2025-09-07T06:39:17.6203835Z * [new tag] v1.11.0-rc1 -> v1.11.0-rc1 2025-09-07T06:39:17.6203887Z * [new tag] v1.11.0-rc2 -> v1.11.0-rc2 2025-09-07T06:39:17.6204965Z * [new tag] v1.11.0-rc3 -> v1.11.0-rc3 2025-09-07T06:39:17.6205022Z * [new tag] v1.11.0-rc4 -> v1.11.0-rc4 2025-09-07T06:39:17.6205074Z * [new tag] v1.11.0-rc5 -> v1.11.0-rc5 2025-09-07T06:39:17.6205125Z * [new tag] v1.11.0-rc6 -> v1.11.0-rc6 2025-09-07T06:39:17.6205179Z * [new tag] v1.11.0-rc7 -> v1.11.0-rc7 2025-09-07T06:39:17.6205230Z * [new tag] v1.12.0 -> v1.12.0 2025-09-07T06:39:17.6205281Z * [new tag] v1.12.0-rc1 -> v1.12.0-rc1 2025-09-07T06:39:17.6205335Z * [new tag] v1.12.0-rc2 -> v1.12.0-rc2 2025-09-07T06:39:17.6205387Z * [new tag] v1.12.0-rc3 -> v1.12.0-rc3 2025-09-07T06:39:17.6205438Z * [new tag] v1.12.0-rc4 -> v1.12.0-rc4 2025-09-07T06:39:17.6205489Z * [new tag] v1.12.0-rc5 -> v1.12.0-rc5 2025-09-07T06:39:17.6205541Z * [new tag] v1.12.0-rc6 -> v1.12.0-rc6 2025-09-07T06:39:17.6205593Z * [new tag] v1.12.0-rc7 -> v1.12.0-rc7 2025-09-07T06:39:17.6205644Z * [new tag] v1.12.0-rc8 -> v1.12.0-rc8 2025-09-07T06:39:17.6205696Z * [new tag] v1.12.1 -> v1.12.1 2025-09-07T06:39:17.6205749Z * [new tag] v1.12.1-rc1 -> v1.12.1-rc1 2025-09-07T06:39:17.6205801Z * [new tag] v1.12.1-rc2 -> v1.12.1-rc2 2025-09-07T06:39:17.6205853Z * [new tag] v1.12.1-rc3 -> v1.12.1-rc3 2025-09-07T06:39:17.6205943Z * [new tag] v1.12.1-rc4 -> v1.12.1-rc4 2025-09-07T06:39:17.6205996Z * [new tag] v1.12.1-rc5 -> v1.12.1-rc5 2025-09-07T06:39:17.6206047Z * [new tag] v1.13.0 -> v1.13.0 2025-09-07T06:39:17.6206099Z * [new tag] v1.13.0-rc1 -> v1.13.0-rc1 2025-09-07T06:39:17.6206149Z * [new tag] v1.13.0-rc2 -> v1.13.0-rc2 2025-09-07T06:39:17.6206203Z * [new tag] v1.13.0-rc3 -> v1.13.0-rc3 2025-09-07T06:39:17.6207364Z * [new tag] v1.13.0-rc4 -> v1.13.0-rc4 2025-09-07T06:39:17.6207486Z * [new tag] v1.13.0-rc5 -> v1.13.0-rc5 2025-09-07T06:39:17.6207540Z * [new tag] v1.13.0-rc6 -> v1.13.0-rc6 2025-09-07T06:39:17.6207591Z * [new tag] v1.13.1 -> v1.13.1 2025-09-07T06:39:17.6207644Z * [new tag] v1.13.1-rc1 -> v1.13.1-rc1 2025-09-07T06:39:17.6207695Z * [new tag] v1.2.0 -> v1.2.0 2025-09-07T06:39:17.6207750Z * [new tag] v1.2.0a0 -> v1.2.0a0 2025-09-07T06:39:17.6207801Z * [new tag] v1.3.0 -> v1.3.0 2025-09-07T06:39:17.6207854Z * [new tag] v1.3.0a0 -> v1.3.0a0 2025-09-07T06:39:17.6207905Z * [new tag] v1.3.1 -> v1.3.1 2025-09-07T06:39:17.6207954Z * [new tag] v1.4.0 -> v1.4.0 2025-09-07T06:39:17.6208007Z * [new tag] v1.4.0a0 -> v1.4.0a0 2025-09-07T06:39:17.6208057Z * [new tag] v1.4.1 -> v1.4.1 2025-09-07T06:39:17.6208106Z * [new tag] v1.5.0 -> v1.5.0 2025-09-07T06:39:17.6208164Z * [new tag] v1.5.0-rc1 -> v1.5.0-rc1 2025-09-07T06:39:17.6208220Z * [new tag] v1.5.0-rc2 -> v1.5.0-rc2 2025-09-07T06:39:17.6208273Z * [new tag] v1.5.0-rc3 -> v1.5.0-rc3 2025-09-07T06:39:17.6208325Z * [new tag] v1.5.0-rc4 -> v1.5.0-rc4 2025-09-07T06:39:17.6208381Z * [new tag] v1.5.0-rc5 -> v1.5.0-rc5 2025-09-07T06:39:17.6208432Z * [new tag] v1.5.1 -> v1.5.1 2025-09-07T06:39:17.6208484Z * [new tag] v1.5.1-rc1 -> v1.5.1-rc1 2025-09-07T06:39:17.6208534Z * [new tag] v1.6.0 -> v1.6.0 2025-09-07T06:39:17.6208588Z * [new tag] v1.6.0-rc1 -> v1.6.0-rc1 2025-09-07T06:39:17.6208640Z * [new tag] v1.6.0-rc2 -> v1.6.0-rc2 2025-09-07T06:39:17.6208690Z * [new tag] v1.6.0-rc3 -> v1.6.0-rc3 2025-09-07T06:39:17.6209745Z * [new tag] v1.6.0-rc4 -> v1.6.0-rc4 2025-09-07T06:39:17.6209797Z * [new tag] v1.6.0-rc5 -> v1.6.0-rc5 2025-09-07T06:39:17.6209848Z * [new tag] v1.6.0-rc6 -> v1.6.0-rc6 2025-09-07T06:39:17.6209901Z * [new tag] v1.6.0-rc7 -> v1.6.0-rc7 2025-09-07T06:39:17.6209951Z * [new tag] v1.7.0 -> v1.7.0 2025-09-07T06:39:17.6210003Z * [new tag] v1.7.0-rc1 -> v1.7.0-rc1 2025-09-07T06:39:17.6210055Z * [new tag] v1.7.0-rc2 -> v1.7.0-rc2 2025-09-07T06:39:17.6210110Z * [new tag] v1.7.0-rc3 -> v1.7.0-rc3 2025-09-07T06:39:17.6210161Z * [new tag] v1.7.0-rc4 -> v1.7.0-rc4 2025-09-07T06:39:17.6210213Z * [new tag] v1.7.1 -> v1.7.1 2025-09-07T06:39:17.6210326Z * [new tag] v1.7.1-rc1 -> v1.7.1-rc1 2025-09-07T06:39:17.6210379Z * [new tag] v1.7.1-rc2 -> v1.7.1-rc2 2025-09-07T06:39:17.6210430Z * [new tag] v1.7.1-rc3 -> v1.7.1-rc3 2025-09-07T06:39:17.6210482Z * [new tag] v1.8.0 -> v1.8.0 2025-09-07T06:39:17.6210534Z * [new tag] v1.8.0-rc1 -> v1.8.0-rc1 2025-09-07T06:39:17.6210585Z * [new tag] v1.8.0-rc2 -> v1.8.0-rc2 2025-09-07T06:39:17.6210638Z * [new tag] v1.8.0-rc3 -> v1.8.0-rc3 2025-09-07T06:39:17.6210721Z * [new tag] v1.8.0-rc4 -> v1.8.0-rc4 2025-09-07T06:39:17.6210772Z * [new tag] v1.8.0-rc5 -> v1.8.0-rc5 2025-09-07T06:39:17.6210824Z * [new tag] v1.8.1 -> v1.8.1 2025-09-07T06:39:17.6210877Z * [new tag] v1.8.1-rc1 -> v1.8.1-rc1 2025-09-07T06:39:17.6210928Z * [new tag] v1.8.1-rc2 -> v1.8.1-rc2 2025-09-07T06:39:17.6210980Z * [new tag] v1.8.1-rc3 -> v1.8.1-rc3 2025-09-07T06:39:17.6211031Z * [new tag] v1.8.2 -> v1.8.2 2025-09-07T06:39:17.6212101Z * [new tag] v1.8.2-rc1 -> v1.8.2-rc1 2025-09-07T06:39:17.6212156Z * [new tag] v1.9.0 -> v1.9.0 2025-09-07T06:39:17.6212209Z * [new tag] v1.9.0-rc1 -> v1.9.0-rc1 2025-09-07T06:39:17.6212261Z * [new tag] v1.9.0-rc2 -> v1.9.0-rc2 2025-09-07T06:39:17.6212314Z * [new tag] v1.9.0-rc3 -> v1.9.0-rc3 2025-09-07T06:39:17.6212365Z * [new tag] v1.9.0-rc4 -> v1.9.0-rc4 2025-09-07T06:39:17.6212416Z * [new tag] v1.9.1 -> v1.9.1 2025-09-07T06:39:17.6212470Z * [new tag] v1.9.1-rc1 -> v1.9.1-rc1 2025-09-07T06:39:17.6212522Z * [new tag] v1.9.1-rc2 -> v1.9.1-rc2 2025-09-07T06:39:17.6212573Z * [new tag] v2.0.0 -> v2.0.0 2025-09-07T06:39:17.6212624Z * [new tag] v2.0.0-rc1 -> v2.0.0-rc1 2025-09-07T06:39:17.6212677Z * [new tag] v2.0.0-rc2 -> v2.0.0-rc2 2025-09-07T06:39:17.6212728Z * [new tag] v2.0.0-rc3 -> v2.0.0-rc3 2025-09-07T06:39:17.6212779Z * [new tag] v2.0.0-rc4 -> v2.0.0-rc4 2025-09-07T06:39:17.6212833Z * [new tag] v2.0.0-rc5 -> v2.0.0-rc5 2025-09-07T06:39:17.6212884Z * [new tag] v2.0.0-rc6 -> v2.0.0-rc6 2025-09-07T06:39:17.6212935Z * [new tag] v2.0.1 -> v2.0.1 2025-09-07T06:39:17.6212989Z * [new tag] v2.0.1-rc1 -> v2.0.1-rc1 2025-09-07T06:39:17.6213041Z * [new tag] v2.0.1-rc2 -> v2.0.1-rc2 2025-09-07T06:39:17.6213093Z * [new tag] v2.0.1-rc3 -> v2.0.1-rc3 2025-09-07T06:39:17.6213145Z * [new tag] v2.0.1-rc4 -> v2.0.1-rc4 2025-09-07T06:39:17.6213195Z * [new tag] v2.1.0 -> v2.1.0 2025-09-07T06:39:17.6213247Z * [new tag] v2.1.0-rc1 -> v2.1.0-rc1 2025-09-07T06:39:17.6213298Z * [new tag] v2.1.0-rc2 -> v2.1.0-rc2 2025-09-07T06:39:17.6214504Z * [new tag] v2.1.0-rc3 -> v2.1.0-rc3 2025-09-07T06:39:17.6214559Z * [new tag] v2.1.0-rc4 -> v2.1.0-rc4 2025-09-07T06:39:17.6214611Z * [new tag] v2.1.0-rc5 -> v2.1.0-rc5 2025-09-07T06:39:17.6214708Z * [new tag] v2.1.0-rc6 -> v2.1.0-rc6 2025-09-07T06:39:17.6214760Z * [new tag] v2.1.1 -> v2.1.1 2025-09-07T06:39:17.6214811Z * [new tag] v2.1.1-rc1 -> v2.1.1-rc1 2025-09-07T06:39:17.6214864Z * [new tag] v2.1.1-rc2 -> v2.1.1-rc2 2025-09-07T06:39:17.6214915Z * [new tag] v2.1.1-rc3 -> v2.1.1-rc3 2025-09-07T06:39:17.6214965Z * [new tag] v2.1.1-rc4 -> v2.1.1-rc4 2025-09-07T06:39:17.6215017Z * [new tag] v2.1.1-rc5 -> v2.1.1-rc5 2025-09-07T06:39:17.6215068Z * [new tag] v2.1.1-rc6 -> v2.1.1-rc6 2025-09-07T06:39:17.6215147Z * [new tag] v2.1.2 -> v2.1.2 2025-09-07T06:39:17.6215200Z * [new tag] v2.1.2-rc1 -> v2.1.2-rc1 2025-09-07T06:39:17.6215251Z * [new tag] v2.1.2-rc2 -> v2.1.2-rc2 2025-09-07T06:39:17.6215303Z * [new tag] v2.1.2-rc3 -> v2.1.2-rc3 2025-09-07T06:39:17.6215354Z * [new tag] v2.2.0 -> v2.2.0 2025-09-07T06:39:17.6215407Z * [new tag] v2.2.0-rc1 -> v2.2.0-rc1 2025-09-07T06:39:17.6215458Z * [new tag] v2.2.0-rc2 -> v2.2.0-rc2 2025-09-07T06:39:17.6215509Z * [new tag] v2.2.0-rc3 -> v2.2.0-rc3 2025-09-07T06:39:17.6215561Z * [new tag] v2.2.0-rc4 -> v2.2.0-rc4 2025-09-07T06:39:17.6215612Z * [new tag] v2.2.0-rc5 -> v2.2.0-rc5 2025-09-07T06:39:17.6215665Z * [new tag] v2.2.0-rc6 -> v2.2.0-rc6 2025-09-07T06:39:17.6215717Z * [new tag] v2.2.0-rc7 -> v2.2.0-rc7 2025-09-07T06:39:17.6215768Z * [new tag] v2.2.0-rc8 -> v2.2.0-rc8 2025-09-07T06:39:17.6215820Z * [new tag] v2.2.1 -> v2.2.1 2025-09-07T06:39:17.6217007Z * [new tag] v2.2.1-rc1 -> v2.2.1-rc1 2025-09-07T06:39:17.6217061Z * [new tag] v2.2.1-rc2 -> v2.2.1-rc2 2025-09-07T06:39:17.6217112Z * [new tag] v2.2.1-rc3 -> v2.2.1-rc3 2025-09-07T06:39:17.6217163Z * [new tag] v2.2.2 -> v2.2.2 2025-09-07T06:39:17.6217215Z * [new tag] v2.2.2-rc1 -> v2.2.2-rc1 2025-09-07T06:39:17.6217267Z * [new tag] v2.2.2-rc2 -> v2.2.2-rc2 2025-09-07T06:39:17.6217321Z * [new tag] v2.2.2-rc3 -> v2.2.2-rc3 2025-09-07T06:39:17.6217371Z * [new tag] v2.3.0 -> v2.3.0 2025-09-07T06:39:17.6217423Z * [new tag] v2.3.0-rc1 -> v2.3.0-rc1 2025-09-07T06:39:17.6217481Z * [new tag] v2.3.0-rc10 -> v2.3.0-rc10 2025-09-07T06:39:17.6217536Z * [new tag] v2.3.0-rc11 -> v2.3.0-rc11 2025-09-07T06:39:17.6217589Z * [new tag] v2.3.0-rc12 -> v2.3.0-rc12 2025-09-07T06:39:17.6217641Z * [new tag] v2.3.0-rc2 -> v2.3.0-rc2 2025-09-07T06:39:17.6217694Z * [new tag] v2.3.0-rc3 -> v2.3.0-rc3 2025-09-07T06:39:17.6217746Z * [new tag] v2.3.0-rc4 -> v2.3.0-rc4 2025-09-07T06:39:17.6217797Z * [new tag] v2.3.0-rc5 -> v2.3.0-rc5 2025-09-07T06:39:17.6217851Z * [new tag] v2.3.0-rc6 -> v2.3.0-rc6 2025-09-07T06:39:17.6217903Z * [new tag] v2.3.0-rc7 -> v2.3.0-rc7 2025-09-07T06:39:17.6217954Z * [new tag] v2.3.0-rc8 -> v2.3.0-rc8 2025-09-07T06:39:17.6218069Z * [new tag] v2.3.0-rc9 -> v2.3.0-rc9 2025-09-07T06:39:17.6218121Z * [new tag] v2.3.1 -> v2.3.1 2025-09-07T06:39:17.6218173Z * [new tag] v2.3.1-rc1 -> v2.3.1-rc1 2025-09-07T06:39:17.6218225Z * [new tag] v2.3.1-rc2 -> v2.3.1-rc2 2025-09-07T06:39:17.6218277Z * [new tag] v2.3.1-rc3 -> v2.3.1-rc3 2025-09-07T06:39:17.6219355Z * [new tag] v2.4.0 -> v2.4.0 2025-09-07T06:39:17.6219411Z * [new tag] v2.4.0-rc1 -> v2.4.0-rc1 2025-09-07T06:39:17.6219462Z * [new tag] v2.4.0-rc2 -> v2.4.0-rc2 2025-09-07T06:39:17.6219584Z * [new tag] v2.4.0-rc3 -> v2.4.0-rc3 2025-09-07T06:39:17.6219636Z * [new tag] v2.4.0-rc4 -> v2.4.0-rc4 2025-09-07T06:39:17.6219688Z * [new tag] v2.4.0-rc5 -> v2.4.0-rc5 2025-09-07T06:39:17.6219741Z * [new tag] v2.4.0-rc6 -> v2.4.0-rc6 2025-09-07T06:39:17.6219792Z * [new tag] v2.4.0-rc7 -> v2.4.0-rc7 2025-09-07T06:39:17.6219845Z * [new tag] v2.4.0-rc8 -> v2.4.0-rc8 2025-09-07T06:39:17.6219896Z * [new tag] v2.4.0-rc9 -> v2.4.0-rc9 2025-09-07T06:39:17.6219946Z * [new tag] v2.4.1 -> v2.4.1 2025-09-07T06:39:17.6219998Z * [new tag] v2.4.1-rc1 -> v2.4.1-rc1 2025-09-07T06:39:17.6220049Z * [new tag] v2.4.1-rc2 -> v2.4.1-rc2 2025-09-07T06:39:17.6220102Z * [new tag] v2.4.1-rc3 -> v2.4.1-rc3 2025-09-07T06:39:17.6220153Z * [new tag] v2.5.0 -> v2.5.0 2025-09-07T06:39:17.6220204Z * [new tag] v2.5.0-rc1 -> v2.5.0-rc1 2025-09-07T06:39:17.6220260Z * [new tag] v2.5.0-rc10 -> v2.5.0-rc10 2025-09-07T06:39:17.6220312Z * [new tag] v2.5.0-rc2 -> v2.5.0-rc2 2025-09-07T06:39:17.6220364Z * [new tag] v2.5.0-rc3 -> v2.5.0-rc3 2025-09-07T06:39:17.6220415Z * [new tag] v2.5.0-rc4 -> v2.5.0-rc4 2025-09-07T06:39:17.6220467Z * [new tag] v2.5.0-rc5 -> v2.5.0-rc5 2025-09-07T06:39:17.6220518Z * [new tag] v2.5.0-rc6 -> v2.5.0-rc6 2025-09-07T06:39:17.6220569Z * [new tag] v2.5.0-rc7 -> v2.5.0-rc7 2025-09-07T06:39:17.6220622Z * [new tag] v2.5.0-rc8 -> v2.5.0-rc8 2025-09-07T06:39:17.6221689Z * [new tag] v2.5.0-rc9 -> v2.5.0-rc9 2025-09-07T06:39:17.6221744Z * [new tag] v2.5.1 -> v2.5.1 2025-09-07T06:39:17.6221797Z * [new tag] v2.5.1-rc1 -> v2.5.1-rc1 2025-09-07T06:39:17.6221848Z * [new tag] v2.6.0 -> v2.6.0 2025-09-07T06:39:17.6221900Z * [new tag] v2.6.0-rc1 -> v2.6.0-rc1 2025-09-07T06:39:17.6221951Z * [new tag] v2.6.0-rc2 -> v2.6.0-rc2 2025-09-07T06:39:17.6222004Z * [new tag] v2.6.0-rc3 -> v2.6.0-rc3 2025-09-07T06:39:17.6222055Z * [new tag] v2.6.0-rc4 -> v2.6.0-rc4 2025-09-07T06:39:17.6222106Z * [new tag] v2.6.0-rc5 -> v2.6.0-rc5 2025-09-07T06:39:17.6222158Z * [new tag] v2.6.0-rc6 -> v2.6.0-rc6 2025-09-07T06:39:17.6222210Z * [new tag] v2.6.0-rc7 -> v2.6.0-rc7 2025-09-07T06:39:17.6222261Z * [new tag] v2.6.0-rc8 -> v2.6.0-rc8 2025-09-07T06:39:17.6222313Z * [new tag] v2.6.0-rc9 -> v2.6.0-rc9 2025-09-07T06:39:17.6222405Z * [new tag] v2.7.0 -> v2.7.0 2025-09-07T06:39:17.6222457Z * [new tag] v2.7.0-rc1 -> v2.7.0-rc1 2025-09-07T06:39:17.6222512Z * [new tag] v2.7.0-rc10 -> v2.7.0-rc10 2025-09-07T06:39:17.6222564Z * [new tag] v2.7.0-rc2 -> v2.7.0-rc2 2025-09-07T06:39:17.6222615Z * [new tag] v2.7.0-rc3 -> v2.7.0-rc3 2025-09-07T06:39:17.6222666Z * [new tag] v2.7.0-rc4 -> v2.7.0-rc4 2025-09-07T06:39:17.6222719Z * [new tag] v2.7.0-rc5 -> v2.7.0-rc5 2025-09-07T06:39:17.6222802Z * [new tag] v2.7.0-rc6 -> v2.7.0-rc6 2025-09-07T06:39:17.6222854Z * [new tag] v2.7.0-rc7 -> v2.7.0-rc7 2025-09-07T06:39:17.6222907Z * [new tag] v2.7.0-rc8 -> v2.7.0-rc8 2025-09-07T06:39:17.6222959Z * [new tag] v2.7.0-rc9 -> v2.7.0-rc9 2025-09-07T06:39:17.6223009Z * [new tag] v2.7.1 -> v2.7.1 2025-09-07T06:39:17.6224076Z * [new tag] v2.7.1-rc1 -> v2.7.1-rc1 2025-09-07T06:39:17.6224128Z * [new tag] v2.7.1-rc2 -> v2.7.1-rc2 2025-09-07T06:39:17.6224179Z * [new tag] v2.7.1-rc3 -> v2.7.1-rc3 2025-09-07T06:39:17.6224231Z * [new tag] v2.7.1-rc4 -> v2.7.1-rc4 2025-09-07T06:39:17.6224283Z * [new tag] v2.7.1-rc5 -> v2.7.1-rc5 2025-09-07T06:39:17.6224334Z * [new tag] v2.8.0 -> v2.8.0 2025-09-07T06:39:17.6224386Z * [new tag] v2.8.0-rc1 -> v2.8.0-rc1 2025-09-07T06:39:17.6224438Z * [new tag] v2.8.0-rc2 -> v2.8.0-rc2 2025-09-07T06:39:17.6224490Z * [new tag] v2.8.0-rc3 -> v2.8.0-rc3 2025-09-07T06:39:17.6224543Z * [new tag] v2.8.0-rc4 -> v2.8.0-rc4 2025-09-07T06:39:17.6224594Z * [new tag] v2.8.0-rc5 -> v2.8.0-rc5 2025-09-07T06:39:17.6224645Z * [new tag] v2.8.0-rc6 -> v2.8.0-rc6 2025-09-07T06:39:17.6224696Z * [new tag] v2.8.0-rc7 -> v2.8.0-rc7 2025-09-07T06:39:17.6224747Z * [new tag] v2.8.0-rc8 -> v2.8.0-rc8 2025-09-07T06:39:17.6224808Z * [new tag] whc_flight_1 -> whc_flight_1 2025-09-07T06:39:17.6224868Z * [new tag] whc_flight_2 -> whc_flight_2 2025-09-07T06:39:17.6224925Z * [new tag] whc_flight_4 -> whc_flight_4 2025-09-07T06:39:17.8388091Z [command]/usr/bin/git rev-parse --verify --quiet 93fb23d6fae7c4e82c4239a1033e522088742634^{object} 2025-09-07T06:39:17.8411479Z 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:39:17.8414619Z ##[endgroup] 2025-09-07T06:39:17.8414826Z ##[group]Determining the checkout info 2025-09-07T06:39:17.8415043Z ##[endgroup] 2025-09-07T06:39:17.8416440Z [command]/usr/bin/git sparse-checkout disable 2025-09-07T06:39:17.8439491Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-09-07T06:39:17.8466162Z ##[group]Checking out the ref 2025-09-07T06:39:17.8468449Z [command]/usr/bin/git checkout --progress --force 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:39:18.5036277Z Note: switching to '93fb23d6fae7c4e82c4239a1033e522088742634'. 2025-09-07T06:39:18.5036688Z 2025-09-07T06:39:18.5036850Z You are in 'detached HEAD' state. You can look around, make experimental 2025-09-07T06:39:18.5037050Z changes and commit them, and you can discard any commits you make in this 2025-09-07T06:39:18.5037256Z state without impacting any branches by switching back to a branch. 2025-09-07T06:39:18.5037434Z 2025-09-07T06:39:18.5037764Z If you want to create a new branch to retain commits you create, you may 2025-09-07T06:39:18.5038400Z do so (now or later) by using -c with the switch command. Example: 2025-09-07T06:39:18.5038554Z 2025-09-07T06:39:18.5038627Z git switch -c 2025-09-07T06:39:18.5038743Z 2025-09-07T06:39:18.5038808Z Or undo this operation with: 2025-09-07T06:39:18.5038974Z 2025-09-07T06:39:18.5039026Z git switch - 2025-09-07T06:39:18.5039103Z 2025-09-07T06:39:18.5039238Z Turn off this advice by setting config variable advice.detachedHead to false 2025-09-07T06:39:18.5039432Z 2025-09-07T06:39:18.5039538Z HEAD is now at 93fb23d6fae Build vLLM nightly wheels (#162000) 2025-09-07T06:39:18.5104679Z ##[endgroup] 2025-09-07T06:39:18.5104854Z ##[group]Setting up auth for fetching submodules 2025-09-07T06:39:18.5105134Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-09-07T06:39:18.5136606Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-09-07T06:39:18.5162122Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-09-07T06:39:18.5180574Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-09-07T06:39:18.5203936Z ##[endgroup] 2025-09-07T06:39:18.5204129Z ##[group]Fetching submodules 2025-09-07T06:39:18.5204261Z [command]/usr/bin/git submodule sync --recursive 2025-09-07T06:39:18.5389265Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2025-09-07T06:39:18.5548767Z Submodule 'android/libs/fbjni' (https://github.com/facebookincubator/fbjni.git) registered for path 'android/libs/fbjni' 2025-09-07T06:39:18.5551408Z Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16' 2025-09-07T06:39:18.5554593Z Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv' 2025-09-07T06:39:18.5556473Z Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK' 2025-09-07T06:39:18.5558323Z Submodule 'third_party/NVTX' (https://github.com/NVIDIA/NVTX.git) registered for path 'third_party/NVTX' 2025-09-07T06:39:18.5560563Z Submodule 'third_party/VulkanMemoryAllocator' (https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.git) registered for path 'third_party/VulkanMemoryAllocator' 2025-09-07T06:39:18.5566228Z Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK' 2025-09-07T06:39:18.5566741Z Submodule 'third_party/aiter' (https://github.com/ROCm/aiter.git) registered for path 'third_party/aiter' 2025-09-07T06:39:18.5567308Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark' 2025-09-07T06:39:18.5569375Z Submodule 'third_party/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/composable_kernel' 2025-09-07T06:39:18.5571162Z Submodule 'third_party/cpp-httplib' (https://github.com/yhirose/cpp-httplib.git) registered for path 'third_party/cpp-httplib' 2025-09-07T06:39:18.5572986Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo.git) registered for path 'third_party/cpuinfo' 2025-09-07T06:39:18.5574824Z Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend' 2025-09-07T06:39:18.5576782Z Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/cutlass' 2025-09-07T06:39:18.5578837Z Submodule 'third_party/fbgemm' (https://github.com/pytorch/fbgemm) registered for path 'third_party/fbgemm' 2025-09-07T06:39:18.5580735Z Submodule 'third_party/flash-attention' (https://github.com/Dao-AILab/flash-attention.git) registered for path 'third_party/flash-attention' 2025-09-07T06:39:18.5582821Z Submodule 'third_party/flatbuffers' (https://github.com/google/flatbuffers.git) registered for path 'third_party/flatbuffers' 2025-09-07T06:39:18.5586687Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt' 2025-09-07T06:39:18.5587051Z Submodule 'third_party/gemmlowp/gemmlowp' (https://github.com/google/gemmlowp.git) registered for path 'third_party/gemmlowp/gemmlowp' 2025-09-07T06:39:18.5588659Z Submodule 'third_party/gloo' (https://github.com/pytorch/gloo) registered for path 'third_party/gloo' 2025-09-07T06:39:18.5590618Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest' 2025-09-07T06:39:18.5592642Z Submodule 'third_party/ideep' (https://github.com/intel/ideep) registered for path 'third_party/ideep' 2025-09-07T06:39:18.5594654Z Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi' 2025-09-07T06:39:18.5596762Z Submodule 'third_party/kineto' (https://github.com/pytorch/kineto) registered for path 'third_party/kineto' 2025-09-07T06:39:18.5598891Z Submodule 'third_party/kleidiai' (https://github.com/ARM-software/kleidiai.git) registered for path 'third_party/kleidiai' 2025-09-07T06:39:18.5601000Z Submodule 'third_party/mimalloc' (https://github.com/microsoft/mimalloc.git) registered for path 'third_party/mimalloc' 2025-09-07T06:39:18.5602988Z Submodule 'third_party/nlohmann' (https://github.com/nlohmann/json.git) registered for path 'third_party/nlohmann' 2025-09-07T06:39:18.5604927Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx' 2025-09-07T06:39:18.5607168Z Submodule 'third_party/opentelemetry-cpp' (https://github.com/open-telemetry/opentelemetry-cpp.git) registered for path 'third_party/opentelemetry-cpp' 2025-09-07T06:39:18.5609489Z Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft' 2025-09-07T06:39:18.5611635Z Submodule 'third_party/protobuf' (https://github.com/protocolbuffers/protobuf.git) registered for path 'third_party/protobuf' 2025-09-07T06:39:18.5613680Z Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd' 2025-09-07T06:39:18.5615974Z Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool' 2025-09-07T06:39:18.5618387Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11' 2025-09-07T06:39:18.5620532Z Submodule 'third_party/python-peachpy' (https://github.com/malfet/PeachPy.git) registered for path 'third_party/python-peachpy' 2025-09-07T06:39:18.5622905Z Submodule 'third_party/sleef' (https://github.com/shibatch/sleef) registered for path 'third_party/sleef' 2025-09-07T06:39:18.5625179Z Submodule 'third_party/tensorpipe' (https://github.com/pytorch/tensorpipe.git) registered for path 'third_party/tensorpipe' 2025-09-07T06:39:18.5649652Z Cloning into '/home/runner/_work/pytorch/pytorch/android/libs/fbjni'... 2025-09-07T06:39:18.9373895Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/psimd'... 2025-09-07T06:39:18.9379216Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/FXdiv'... 2025-09-07T06:39:18.9379423Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/FP16'... 2025-09-07T06:39:18.9379618Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/NNPACK'... 2025-09-07T06:39:18.9379819Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/pthreadpool'... 2025-09-07T06:39:18.9388289Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/NVTX'... 2025-09-07T06:39:19.1445038Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/pocketfft'... 2025-09-07T06:39:19.1445468Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/python-peachpy'... 2025-09-07T06:39:19.1445822Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/gemmlowp/gemmlowp'... 2025-09-07T06:39:19.1446881Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/ideep'... 2025-09-07T06:39:19.1447182Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/gloo'... 2025-09-07T06:39:19.1447502Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/benchmark'... 2025-09-07T06:39:19.2445753Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/VulkanMemoryAllocator'... 2025-09-07T06:39:20.4284054Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/tensorpipe'... 2025-09-07T06:39:20.4284511Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/ittapi'... 2025-09-07T06:39:20.4284825Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/kleidiai'... 2025-09-07T06:39:20.4285834Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/cpp-httplib'... 2025-09-07T06:39:20.4286164Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/flash-attention'... 2025-09-07T06:39:20.4286822Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/cpuinfo'... 2025-09-07T06:39:20.4287123Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/sleef'... 2025-09-07T06:39:20.4287434Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/googletest'... 2025-09-07T06:39:20.4287752Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/mimalloc'... 2025-09-07T06:39:20.4288058Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/pybind11'... 2025-09-07T06:39:20.4297808Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/cudnn_frontend'... 2025-09-07T06:39:20.4298134Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/fmt'... 2025-09-07T06:39:20.5169262Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/XNNPACK'... 2025-09-07T06:39:27.5746899Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/kineto'... 2025-09-07T06:39:27.5747582Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/flatbuffers'... 2025-09-07T06:39:27.5748177Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/fbgemm'... 2025-09-07T06:39:27.5748698Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/cutlass'... 2025-09-07T06:39:27.5749209Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/onnx'... 2025-09-07T06:39:27.5758889Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/composable_kernel'... 2025-09-07T06:39:27.5759199Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/aiter'... 2025-09-07T06:39:27.5759509Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp'... 2025-09-07T06:39:27.5759824Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/nlohmann'... 2025-09-07T06:39:27.5760132Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/protobuf'... 2025-09-07T06:39:27.5862109Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2025-09-07T06:39:27.5938478Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2025-09-07T06:39:27.6002629Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2025-09-07T06:39:27.6140515Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2025-09-07T06:39:27.6556392Z Submodule path 'third_party/NVTX': checked out '2942f167cc30c5e3a44a2aecd5b0d9c07ff61a07' 2025-09-07T06:39:27.6802381Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1' 2025-09-07T06:39:28.0005304Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883' 2025-09-07T06:39:28.0801393Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150' 2025-09-07T06:39:28.0822989Z Submodule '3rdparty/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T06:39:28.0854356Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/aiter/3rdparty/composable_kernel'... 2025-09-07T06:39:30.7874194Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf' 2025-09-07T06:39:30.8020074Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f' 2025-09-07T06:39:30.9462630Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-09-07T06:39:30.9766959Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246' 2025-09-07T06:39:31.0478474Z Submodule path 'third_party/cpuinfo': checked out '5e3d2445e6a84d9599bee2bf78edbb4d80865e1d' 2025-09-07T06:39:31.0722031Z Submodule path 'third_party/cudnn_frontend': checked out 'f937055efc6d414d11f4c6577e3977fe74f35fb6' 2025-09-07T06:39:31.3714108Z Submodule path 'third_party/cutlass': checked out 'e51efbfe18fe4f4cbb66ab814c55bf4aa0185491' 2025-09-07T06:39:31.4519306Z Submodule path 'third_party/fbgemm': checked out '4b39c551efe15e6bbade20565b0ceb2d8ce3352d' 2025-09-07T06:39:31.4553315Z Submodule 'external/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'third_party/fbgemm/external/asmjit' 2025-09-07T06:39:31.4614433Z Submodule 'external/composable_kernel' (https://github.com/jwfromm/composable_kernel.git) registered for path 'third_party/fbgemm/external/composable_kernel' 2025-09-07T06:39:31.4618023Z Submodule 'external/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'third_party/fbgemm/external/cpuinfo' 2025-09-07T06:39:31.4619982Z Submodule 'external/cutlass' (https://github.com/jwfromm/cutlass) registered for path 'third_party/fbgemm/external/cutlass' 2025-09-07T06:39:31.4621826Z Submodule 'external/googletest' (https://github.com/google/googletest) registered for path 'third_party/fbgemm/external/googletest' 2025-09-07T06:39:31.4628463Z Submodule 'external/hipify_torch' (https://github.com/ROCmSoftwarePlatform/hipify_torch.git) registered for path 'third_party/fbgemm/external/hipify_torch' 2025-09-07T06:39:31.4636404Z Submodule 'external/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/fbgemm/external/json' 2025-09-07T06:39:31.4667839Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/fbgemm/external/asmjit'... 2025-09-07T06:39:32.5542369Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/fbgemm/external/hipify_torch'... 2025-09-07T06:39:32.5542673Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/fbgemm/external/cpuinfo'... 2025-09-07T06:39:32.5542924Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/fbgemm/external/googletest'... 2025-09-07T06:39:32.5543192Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/fbgemm/external/composable_kernel'... 2025-09-07T06:39:32.6542559Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/fbgemm/external/cutlass'... 2025-09-07T06:39:32.9211763Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/fbgemm/external/json'... 2025-09-07T06:39:34.9519743Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea' 2025-09-07T06:39:35.0717814Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out 'b1281b8b08d973a7064f864f47eeb30f3e2596e9' 2025-09-07T06:39:35.1256303Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-09-07T06:39:35.4411179Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '311f3c8e51dc0eb56310cfc6980bf63d0fbd7917' 2025-09-07T06:39:35.4681453Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-09-07T06:39:35.4784021Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691' 2025-09-07T06:39:35.5341466Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-09-07T06:39:35.5739782Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5' 2025-09-07T06:39:35.5753130Z Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T06:39:35.5760368Z Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/flash-attention/csrc/cutlass' 2025-09-07T06:39:35.5778317Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/composable_kernel'... 2025-09-07T06:39:39.0337285Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/cutlass'... 2025-09-07T06:39:39.2582030Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33' 2025-09-07T06:39:39.5280082Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420' 2025-09-07T06:39:39.6565023Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757' 2025-09-07T06:39:39.7415368Z Submodule path 'third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21' 2025-09-07T06:39:39.8300310Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2025-09-07T06:39:39.9125639Z Submodule path 'third_party/gloo': checked out 'c7b7b022c124d9643957d9bd55f57ac59fce8fa2' 2025-09-07T06:39:40.0479985Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-09-07T06:39:40.0595034Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3' 2025-09-07T06:39:40.0662784Z Submodule 'mkl-dnn' (https://github.com/intel/mkl-dnn.git) registered for path 'third_party/ideep/mkl-dnn' 2025-09-07T06:39:40.0685966Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn'... 2025-09-07T06:39:48.9801182Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d' 2025-09-07T06:39:48.9944549Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959' 2025-09-07T06:39:49.0461921Z Submodule path 'third_party/kineto': checked out '5e7501833f1021ce6f618572d3baf657b6319658' 2025-09-07T06:39:49.0535409Z Submodule 'libkineto/third_party/dynolog' (https://github.com/facebookincubator/dynolog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T06:39:49.0560746Z Submodule 'libkineto/third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T06:39:49.0582758Z Submodule 'libkineto/third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T06:39:49.0611336Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog'... 2025-09-07T06:39:49.8412374Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/fmt'... 2025-09-07T06:39:50.0621163Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/googletest'... 2025-09-07T06:39:50.1131200Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out '7d04a0053a845370ae06ce317a22a48e9edcc74e' 2025-09-07T06:39:50.1215318Z Submodule 'third_party/DCGM' (https://github.com/NVIDIA/DCGM.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T06:39:50.1225288Z Submodule 'third_party/cpr' (https://github.com/libcpr/cpr.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T06:39:50.1233424Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T06:39:50.1248704Z Submodule 'third_party/gflags' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T06:39:50.1304262Z Submodule 'third_party/glog' (https://github.com/google/glog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T06:39:50.1356888Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T06:39:50.1425076Z Submodule 'third_party/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T06:39:50.1429486Z Submodule 'third_party/pfs' (https://github.com/dtrugman/pfs.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T06:39:50.1451778Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'... 2025-09-07T06:39:51.3136175Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'... 2025-09-07T06:39:51.3137469Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'... 2025-09-07T06:39:51.3138392Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'... 2025-09-07T06:39:51.3139286Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/glog'... 2025-09-07T06:39:51.3140206Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'... 2025-09-07T06:39:51.3141123Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'... 2025-09-07T06:39:51.4135572Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/json'... 2025-09-07T06:39:54.2920610Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9' 2025-09-07T06:39:54.3895877Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400' 2025-09-07T06:39:54.4709023Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05' 2025-09-07T06:39:54.5381921Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067' 2025-09-07T06:39:54.5859533Z Submodule 'doc' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T06:39:54.5887371Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'... 2025-09-07T06:39:55.6863468Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4' 2025-09-07T06:39:55.6999627Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446' 2025-09-07T06:39:55.7445518Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '58d77fa8070e8cec2dc1ed015d66b454c8d78850' 2025-09-07T06:39:55.8002251Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5' 2025-09-07T06:39:55.8112796Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150' 2025-09-07T06:39:55.8865673Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '0041a40c1350ba702d475b9c4ad62da77caea164' 2025-09-07T06:39:55.9411731Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '7aca84427f224eeed3144123d5230d5871e93347' 2025-09-07T06:39:56.0007917Z Submodule path 'third_party/kleidiai': checked out 'cca02c2f69dd18e1f12647c1c0bdc8cf90e680c7' 2025-09-07T06:39:56.0572789Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e' 2025-09-07T06:39:56.1387824Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72' 2025-09-07T06:39:56.3701031Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83' 2025-09-07T06:39:56.3731652Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11' 2025-09-07T06:39:56.3756796Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/onnx/third_party/pybind11'... 2025-09-07T06:39:57.2156810Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4' 2025-09-07T06:39:57.2505972Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878' 2025-09-07T06:39:57.2522695Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark) registered for path 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T06:39:57.2524223Z Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T06:39:57.2532173Z Submodule 'third_party/ms-gsl' (https://github.com/microsoft/GSL) registered for path 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T06:39:57.2532772Z Submodule 'third_party/nlohmann-json' (https://github.com/nlohmann/json) registered for path 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T06:39:57.2533502Z Submodule 'third_party/opentelemetry-proto' (https://github.com/open-telemetry/opentelemetry-proto) registered for path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T06:39:57.2534266Z Submodule 'third_party/opentracing-cpp' (https://github.com/opentracing/opentracing-cpp.git) registered for path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T06:39:57.2534952Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T06:39:57.2535552Z Submodule 'tools/vcpkg' (https://github.com/Microsoft/vcpkg) registered for path 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T06:39:57.2560129Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/benchmark'... 2025-09-07T06:39:57.7550122Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp'... 2025-09-07T06:39:57.7550523Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentelemetry-proto'... 2025-09-07T06:39:57.7556466Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp'... 2025-09-07T06:39:57.7556987Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/ms-gsl'... 2025-09-07T06:39:57.8551219Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/googletest'... 2025-09-07T06:39:58.1613131Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/nlohmann-json'... 2025-09-07T06:40:01.3500349Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/tools/vcpkg'... 2025-09-07T06:40:03.2281292Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2' 2025-09-07T06:40:03.2549723Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1' 2025-09-07T06:40:03.2896695Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa' 2025-09-07T06:40:03.3442445Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d' 2025-09-07T06:40:03.3687906Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce' 2025-09-07T06:40:03.4191027Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5' 2025-09-07T06:40:03.4707885Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d' 2025-09-07T06:40:03.4923389Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T06:40:03.5109566Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T06:40:03.5132255Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'... 2025-09-07T06:40:05.3888790Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'... 2025-09-07T06:40:08.9116780Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4' 2025-09-07T06:40:09.1379825Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-09-07T06:40:09.5631183Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50' 2025-09-07T06:40:09.5735720Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa' 2025-09-07T06:40:09.7302684Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2025-09-07T06:40:09.7373517Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/protobuf/third_party/benchmark' 2025-09-07T06:40:09.7411371Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/protobuf/third_party/googletest' 2025-09-07T06:40:09.7440653Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/protobuf/third_party/benchmark'... 2025-09-07T06:40:11.9435346Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/protobuf/third_party/googletest'... 2025-09-07T06:40:11.9518521Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2025-09-07T06:40:11.9929237Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2025-09-07T06:40:11.9997445Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2025-09-07T06:40:12.0080049Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8' 2025-09-07T06:40:12.0295063Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8' 2025-09-07T06:40:12.0452032Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67' 2025-09-07T06:40:12.0689462Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68' 2025-09-07T06:40:12.0860352Z Submodule path 'third_party/tensorpipe': checked out 'af0118d13e52f5a08841464a768e01a0bf3e3075' 2025-09-07T06:40:12.0867144Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/tensorpipe/third_party/googletest' 2025-09-07T06:40:12.0867676Z Submodule 'third_party/libnop' (https://github.com/google/libnop.git) registered for path 'third_party/tensorpipe/third_party/libnop' 2025-09-07T06:40:12.0869931Z Submodule 'third_party/libuv' (https://github.com/libuv/libuv.git) registered for path 'third_party/tensorpipe/third_party/libuv' 2025-09-07T06:40:12.0871618Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T06:40:12.0891287Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/googletest'... 2025-09-07T06:40:12.8472797Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libnop'... 2025-09-07T06:40:12.9472167Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libuv'... 2025-09-07T06:40:13.1146846Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11'... 2025-09-07T06:40:13.1476108Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2025-09-07T06:40:13.1571187Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2025-09-07T06:40:13.1940595Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b' 2025-09-07T06:40:13.2111645Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2025-09-07T06:40:13.2122474Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T06:40:13.2143177Z Cloning into '/home/runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11/tools/clang'... 2025-09-07T06:40:13.5524993Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2025-09-07T06:40:13.5577382Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2025-09-07T06:40:13.5779978Z Entering 'android/libs/fbjni' 2025-09-07T06:40:13.5817991Z Entering 'third_party/FP16' 2025-09-07T06:40:13.5844314Z Entering 'third_party/FXdiv' 2025-09-07T06:40:13.5880081Z Entering 'third_party/NNPACK' 2025-09-07T06:40:13.5910517Z Entering 'third_party/NVTX' 2025-09-07T06:40:13.5936435Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T06:40:13.5964754Z Entering 'third_party/XNNPACK' 2025-09-07T06:40:13.6002555Z Entering 'third_party/aiter' 2025-09-07T06:40:13.6028810Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T06:40:13.6060652Z Entering 'third_party/benchmark' 2025-09-07T06:40:13.6082879Z Entering 'third_party/composable_kernel' 2025-09-07T06:40:13.6109278Z Entering 'third_party/cpp-httplib' 2025-09-07T06:40:13.6134843Z Entering 'third_party/cpuinfo' 2025-09-07T06:40:13.6158806Z Entering 'third_party/cudnn_frontend' 2025-09-07T06:40:13.6180826Z Entering 'third_party/cutlass' 2025-09-07T06:40:13.6207686Z Entering 'third_party/fbgemm' 2025-09-07T06:40:13.6232067Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T06:40:13.6256598Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T06:40:13.6280859Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T06:40:13.6310367Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T06:40:13.6335290Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T06:40:13.6364544Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T06:40:13.6385022Z Entering 'third_party/fbgemm/external/json' 2025-09-07T06:40:13.6412528Z Entering 'third_party/flash-attention' 2025-09-07T06:40:13.6446671Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T06:40:13.6470272Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T06:40:13.6495435Z Entering 'third_party/flatbuffers' 2025-09-07T06:40:13.6535869Z Entering 'third_party/fmt' 2025-09-07T06:40:13.6558925Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T06:40:13.6579904Z Entering 'third_party/gloo' 2025-09-07T06:40:13.6608561Z Entering 'third_party/googletest' 2025-09-07T06:40:13.6637097Z Entering 'third_party/ideep' 2025-09-07T06:40:13.6664971Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T06:40:13.6686891Z Entering 'third_party/ittapi' 2025-09-07T06:40:13.6709072Z Entering 'third_party/kineto' 2025-09-07T06:40:13.6739456Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T06:40:13.6756762Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T06:40:13.6786889Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T06:40:13.6809968Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T06:40:13.6837472Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T06:40:13.6861049Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T06:40:13.6883158Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T06:40:13.6914096Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T06:40:13.6938505Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T06:40:13.6968283Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T06:40:13.6991145Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T06:40:13.7014854Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T06:40:13.7035957Z Entering 'third_party/kleidiai' 2025-09-07T06:40:13.7058138Z Entering 'third_party/mimalloc' 2025-09-07T06:40:13.7086751Z Entering 'third_party/nlohmann' 2025-09-07T06:40:13.7108162Z Entering 'third_party/onnx' 2025-09-07T06:40:13.7143861Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T06:40:13.7170899Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T06:40:13.7198684Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T06:40:13.7221336Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T06:40:13.7241390Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T06:40:13.7261785Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T06:40:13.7281628Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T06:40:13.7300124Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T06:40:13.7326751Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T06:40:13.7366653Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T06:40:13.7391594Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T06:40:13.7412799Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T06:40:13.7446637Z Entering 'third_party/pocketfft' 2025-09-07T06:40:13.7477095Z Entering 'third_party/protobuf' 2025-09-07T06:40:13.7507253Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T06:40:13.7529958Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T06:40:13.7556595Z Entering 'third_party/psimd' 2025-09-07T06:40:13.7581658Z Entering 'third_party/pthreadpool' 2025-09-07T06:40:13.7603334Z Entering 'third_party/pybind11' 2025-09-07T06:40:13.7626160Z Entering 'third_party/python-peachpy' 2025-09-07T06:40:13.7649962Z Entering 'third_party/sleef' 2025-09-07T06:40:13.7672792Z Entering 'third_party/tensorpipe' 2025-09-07T06:40:13.7696192Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T06:40:13.7718748Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T06:40:13.7739482Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T06:40:13.7761943Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T06:40:13.7783228Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T06:40:13.7815588Z ##[endgroup] 2025-09-07T06:40:13.7817981Z ##[group]Persisting credentials for submodules 2025-09-07T06:40:13.7822429Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-09-07T06:40:13.7985984Z Entering 'android/libs/fbjni' 2025-09-07T06:40:13.8012213Z Entering 'third_party/FP16' 2025-09-07T06:40:13.8040325Z Entering 'third_party/FXdiv' 2025-09-07T06:40:13.8067817Z Entering 'third_party/NNPACK' 2025-09-07T06:40:13.8094381Z Entering 'third_party/NVTX' 2025-09-07T06:40:13.8118932Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T06:40:13.8149401Z Entering 'third_party/XNNPACK' 2025-09-07T06:40:13.8183271Z Entering 'third_party/aiter' 2025-09-07T06:40:13.8209022Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T06:40:13.8241830Z Entering 'third_party/benchmark' 2025-09-07T06:40:13.8271963Z Entering 'third_party/composable_kernel' 2025-09-07T06:40:13.8303126Z Entering 'third_party/cpp-httplib' 2025-09-07T06:40:13.8326223Z Entering 'third_party/cpuinfo' 2025-09-07T06:40:13.8355557Z Entering 'third_party/cudnn_frontend' 2025-09-07T06:40:13.8381254Z Entering 'third_party/cutlass' 2025-09-07T06:40:13.8417273Z Entering 'third_party/fbgemm' 2025-09-07T06:40:13.8447874Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T06:40:13.8475219Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T06:40:13.8502865Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T06:40:13.8527081Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T06:40:13.8553750Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T06:40:13.8576936Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T06:40:13.8601834Z Entering 'third_party/fbgemm/external/json' 2025-09-07T06:40:13.8627574Z Entering 'third_party/flash-attention' 2025-09-07T06:40:13.8658112Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T06:40:13.8683616Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T06:40:13.8710788Z Entering 'third_party/flatbuffers' 2025-09-07T06:40:13.8742808Z Entering 'third_party/fmt' 2025-09-07T06:40:13.8777499Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T06:40:13.8808597Z Entering 'third_party/gloo' 2025-09-07T06:40:13.8844319Z Entering 'third_party/googletest' 2025-09-07T06:40:13.8870178Z Entering 'third_party/ideep' 2025-09-07T06:40:13.8895049Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T06:40:13.8925038Z Entering 'third_party/ittapi' 2025-09-07T06:40:13.8962451Z Entering 'third_party/kineto' 2025-09-07T06:40:13.8987895Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T06:40:13.9012915Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T06:40:13.9040877Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T06:40:13.9065153Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T06:40:13.9090387Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T06:40:13.9112633Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T06:40:13.9142502Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T06:40:13.9169053Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T06:40:13.9200451Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T06:40:13.9231451Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T06:40:13.9264756Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T06:40:13.9285366Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T06:40:13.9313424Z Entering 'third_party/kleidiai' 2025-09-07T06:40:13.9340077Z Entering 'third_party/mimalloc' 2025-09-07T06:40:13.9365785Z Entering 'third_party/nlohmann' 2025-09-07T06:40:13.9392327Z Entering 'third_party/onnx' 2025-09-07T06:40:13.9425835Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T06:40:13.9458218Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T06:40:13.9488777Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T06:40:13.9519016Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T06:40:13.9543910Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T06:40:13.9567071Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T06:40:13.9593742Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T06:40:13.9616454Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T06:40:13.9640534Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T06:40:13.9664036Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T06:40:13.9695171Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T06:40:13.9720772Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T06:40:13.9751887Z Entering 'third_party/pocketfft' 2025-09-07T06:40:13.9777719Z Entering 'third_party/protobuf' 2025-09-07T06:40:13.9805092Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T06:40:13.9828559Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T06:40:13.9851689Z Entering 'third_party/psimd' 2025-09-07T06:40:13.9879805Z Entering 'third_party/pthreadpool' 2025-09-07T06:40:13.9907146Z Entering 'third_party/pybind11' 2025-09-07T06:40:13.9936445Z Entering 'third_party/python-peachpy' 2025-09-07T06:40:13.9961680Z Entering 'third_party/sleef' 2025-09-07T06:40:13.9999628Z Entering 'third_party/tensorpipe' 2025-09-07T06:40:14.0023240Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T06:40:14.0052931Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T06:40:14.0097222Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T06:40:14.0123428Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T06:40:14.0153326Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T06:40:14.0196347Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-09-07T06:40:14.0353395Z Entering 'android/libs/fbjni' 2025-09-07T06:40:14.0374368Z file:/home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-09-07T06:40:14.0385949Z Entering 'third_party/FP16' 2025-09-07T06:40:14.0410834Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-09-07T06:40:14.0422023Z Entering 'third_party/FXdiv' 2025-09-07T06:40:14.0445378Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-09-07T06:40:14.0456368Z Entering 'third_party/NNPACK' 2025-09-07T06:40:14.0479620Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-09-07T06:40:14.0499790Z Entering 'third_party/NVTX' 2025-09-07T06:40:14.0524396Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-09-07T06:40:14.0535546Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T06:40:14.0558627Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-09-07T06:40:14.0570134Z Entering 'third_party/XNNPACK' 2025-09-07T06:40:14.0591624Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-09-07T06:40:14.0607819Z Entering 'third_party/aiter' 2025-09-07T06:40:14.0628306Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-09-07T06:40:14.0638337Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T06:40:14.0660712Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-09-07T06:40:14.0675697Z Entering 'third_party/benchmark' 2025-09-07T06:40:14.0697557Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-09-07T06:40:14.0708750Z Entering 'third_party/composable_kernel' 2025-09-07T06:40:14.0736005Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-09-07T06:40:14.0751493Z Entering 'third_party/cpp-httplib' 2025-09-07T06:40:14.0773013Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-09-07T06:40:14.0783605Z Entering 'third_party/cpuinfo' 2025-09-07T06:40:14.0805556Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-09-07T06:40:14.0816684Z Entering 'third_party/cudnn_frontend' 2025-09-07T06:40:14.0842752Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-09-07T06:40:14.0854020Z Entering 'third_party/cutlass' 2025-09-07T06:40:14.0875379Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-09-07T06:40:14.0891409Z Entering 'third_party/fbgemm' 2025-09-07T06:40:14.0912863Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-09-07T06:40:14.0924704Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T06:40:14.0948289Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-09-07T06:40:14.0955403Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T06:40:14.0980417Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-09-07T06:40:14.0993536Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T06:40:14.1014819Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-09-07T06:40:14.1025605Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T06:40:14.1050126Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-09-07T06:40:14.1065164Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T06:40:14.1092371Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-09-07T06:40:14.1102227Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T06:40:14.1124034Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-09-07T06:40:14.1134111Z Entering 'third_party/fbgemm/external/json' 2025-09-07T06:40:14.1164740Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-09-07T06:40:14.1179034Z Entering 'third_party/flash-attention' 2025-09-07T06:40:14.1200713Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-09-07T06:40:14.1212651Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T06:40:14.1232062Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-09-07T06:40:14.1242472Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T06:40:14.1269230Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-09-07T06:40:14.1283263Z Entering 'third_party/flatbuffers' 2025-09-07T06:40:14.1307838Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-09-07T06:40:14.1320467Z Entering 'third_party/fmt' 2025-09-07T06:40:14.1344918Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-09-07T06:40:14.1355943Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T06:40:14.1378865Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-09-07T06:40:14.1386388Z Entering 'third_party/gloo' 2025-09-07T06:40:14.1408520Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-09-07T06:40:14.1416840Z Entering 'third_party/googletest' 2025-09-07T06:40:14.1444094Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-09-07T06:40:14.1455170Z Entering 'third_party/ideep' 2025-09-07T06:40:14.1480134Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-09-07T06:40:14.1490488Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T06:40:14.1511853Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-09-07T06:40:14.1527394Z Entering 'third_party/ittapi' 2025-09-07T06:40:14.1548363Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-09-07T06:40:14.1559988Z Entering 'third_party/kineto' 2025-09-07T06:40:14.1591715Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-09-07T06:40:14.1601811Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T06:40:14.1630854Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-09-07T06:40:14.1644917Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T06:40:14.1666740Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-09-07T06:40:14.1680506Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T06:40:14.1705872Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-09-07T06:40:14.1715684Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T06:40:14.1740967Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-09-07T06:40:14.1750844Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T06:40:14.1784133Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-09-07T06:40:14.1795554Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T06:40:14.1817202Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-09-07T06:40:14.1827816Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T06:40:14.1853807Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-09-07T06:40:14.1863616Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T06:40:14.1893145Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-09-07T06:40:14.1904233Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T06:40:14.1936958Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-09-07T06:40:14.1948147Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T06:40:14.1970658Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-09-07T06:40:14.1988437Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T06:40:14.2010071Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-09-07T06:40:14.2020666Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T06:40:14.2041159Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-09-07T06:40:14.2055913Z Entering 'third_party/kleidiai' 2025-09-07T06:40:14.2077162Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-09-07T06:40:14.2088612Z Entering 'third_party/mimalloc' 2025-09-07T06:40:14.2108769Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-09-07T06:40:14.2117799Z Entering 'third_party/nlohmann' 2025-09-07T06:40:14.2140324Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-09-07T06:40:14.2152321Z Entering 'third_party/onnx' 2025-09-07T06:40:14.2174804Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-09-07T06:40:14.2191695Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T06:40:14.2213421Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-09-07T06:40:14.2226268Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T06:40:14.2252061Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-09-07T06:40:14.2262845Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T06:40:14.2282045Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-09-07T06:40:14.2292381Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T06:40:14.2311866Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-09-07T06:40:14.2322524Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T06:40:14.2342857Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-09-07T06:40:14.2352316Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T06:40:14.2375322Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-09-07T06:40:14.2386994Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T06:40:14.2410410Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-09-07T06:40:14.2421550Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T06:40:14.2444603Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-09-07T06:40:14.2454771Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T06:40:14.2475185Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-09-07T06:40:14.2487662Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T06:40:14.2512902Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-09-07T06:40:14.2524319Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T06:40:14.2546210Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-09-07T06:40:14.2558075Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T06:40:14.2580857Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-09-07T06:40:14.2601287Z Entering 'third_party/pocketfft' 2025-09-07T06:40:14.2622311Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-09-07T06:40:14.2633182Z Entering 'third_party/protobuf' 2025-09-07T06:40:14.2659622Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-09-07T06:40:14.2670588Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T06:40:14.2692154Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-09-07T06:40:14.2705052Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T06:40:14.2734165Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-09-07T06:40:14.2745334Z Entering 'third_party/psimd' 2025-09-07T06:40:14.2766133Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-09-07T06:40:14.2777823Z Entering 'third_party/pthreadpool' 2025-09-07T06:40:14.2803347Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-09-07T06:40:14.2812964Z Entering 'third_party/pybind11' 2025-09-07T06:40:14.2834101Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-09-07T06:40:14.2845160Z Entering 'third_party/python-peachpy' 2025-09-07T06:40:14.2872280Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-09-07T06:40:14.2883808Z Entering 'third_party/sleef' 2025-09-07T06:40:14.2906210Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-09-07T06:40:14.2921169Z Entering 'third_party/tensorpipe' 2025-09-07T06:40:14.2949704Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-09-07T06:40:14.2961229Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T06:40:14.2992810Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-09-07T06:40:14.3003101Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T06:40:14.3023638Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-09-07T06:40:14.3036869Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T06:40:14.3060599Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-09-07T06:40:14.3070770Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T06:40:14.3095110Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-09-07T06:40:14.3107446Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T06:40:14.3137060Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-09-07T06:40:14.3314991Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-09-07T06:40:14.3518359Z Entering 'android/libs/fbjni' 2025-09-07T06:40:14.3541724Z Entering 'third_party/FP16' 2025-09-07T06:40:14.3567214Z Entering 'third_party/FXdiv' 2025-09-07T06:40:14.3597928Z Entering 'third_party/NNPACK' 2025-09-07T06:40:14.3625081Z Entering 'third_party/NVTX' 2025-09-07T06:40:14.3654423Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T06:40:14.3685732Z Entering 'third_party/XNNPACK' 2025-09-07T06:40:14.3714041Z Entering 'third_party/aiter' 2025-09-07T06:40:14.3746613Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T06:40:14.3782409Z Entering 'third_party/benchmark' 2025-09-07T06:40:14.3808705Z Entering 'third_party/composable_kernel' 2025-09-07T06:40:14.3843521Z Entering 'third_party/cpp-httplib' 2025-09-07T06:40:14.3870106Z Entering 'third_party/cpuinfo' 2025-09-07T06:40:14.3892687Z Entering 'third_party/cudnn_frontend' 2025-09-07T06:40:14.3913746Z Entering 'third_party/cutlass' 2025-09-07T06:40:14.3939762Z Entering 'third_party/fbgemm' 2025-09-07T06:40:14.3965031Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T06:40:14.3989375Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T06:40:14.4014725Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T06:40:14.4037902Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T06:40:14.4069790Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T06:40:14.4096176Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T06:40:14.4129761Z Entering 'third_party/fbgemm/external/json' 2025-09-07T06:40:14.4159060Z Entering 'third_party/flash-attention' 2025-09-07T06:40:14.4182720Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T06:40:14.4207518Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T06:40:14.4246397Z Entering 'third_party/flatbuffers' 2025-09-07T06:40:14.4271220Z Entering 'third_party/fmt' 2025-09-07T06:40:14.4299750Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T06:40:14.4323819Z Entering 'third_party/gloo' 2025-09-07T06:40:14.4353315Z Entering 'third_party/googletest' 2025-09-07T06:40:14.4379784Z Entering 'third_party/ideep' 2025-09-07T06:40:14.4408408Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T06:40:14.4439598Z Entering 'third_party/ittapi' 2025-09-07T06:40:14.4467718Z Entering 'third_party/kineto' 2025-09-07T06:40:14.4489978Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T06:40:14.4516763Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T06:40:14.4552107Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T06:40:14.4573167Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T06:40:14.4598216Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T06:40:14.4620533Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T06:40:14.4652791Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T06:40:14.4682486Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T06:40:14.4721849Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T06:40:14.4747850Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T06:40:14.4781854Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T06:40:14.4815652Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T06:40:14.4845560Z Entering 'third_party/kleidiai' 2025-09-07T06:40:14.4868914Z Entering 'third_party/mimalloc' 2025-09-07T06:40:14.4898221Z Entering 'third_party/nlohmann' 2025-09-07T06:40:14.4928768Z Entering 'third_party/onnx' 2025-09-07T06:40:14.4964403Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T06:40:14.4992254Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T06:40:14.5016344Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T06:40:14.5043851Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T06:40:14.5069599Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T06:40:14.5098220Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T06:40:14.5133225Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T06:40:14.5156755Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T06:40:14.5187904Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T06:40:14.5210377Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T06:40:14.5237047Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T06:40:14.5266409Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T06:40:14.5297016Z Entering 'third_party/pocketfft' 2025-09-07T06:40:14.5319410Z Entering 'third_party/protobuf' 2025-09-07T06:40:14.5342789Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T06:40:14.5366790Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T06:40:14.5406812Z Entering 'third_party/psimd' 2025-09-07T06:40:14.5428918Z Entering 'third_party/pthreadpool' 2025-09-07T06:40:14.5455615Z Entering 'third_party/pybind11' 2025-09-07T06:40:14.5484905Z Entering 'third_party/python-peachpy' 2025-09-07T06:40:14.5510570Z Entering 'third_party/sleef' 2025-09-07T06:40:14.5818182Z Entering 'third_party/tensorpipe' 2025-09-07T06:40:14.5846312Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T06:40:14.5875717Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T06:40:14.5900198Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T06:40:14.5929528Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T06:40:14.5957193Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T06:40:14.6002268Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-09-07T06:40:14.6171307Z Entering 'android/libs/fbjni' 2025-09-07T06:40:14.6196933Z Entering 'third_party/FP16' 2025-09-07T06:40:14.6220620Z Entering 'third_party/FXdiv' 2025-09-07T06:40:14.6248110Z Entering 'third_party/NNPACK' 2025-09-07T06:40:14.6269501Z Entering 'third_party/NVTX' 2025-09-07T06:40:14.6299766Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T06:40:14.6326869Z Entering 'third_party/XNNPACK' 2025-09-07T06:40:14.6364229Z Entering 'third_party/aiter' 2025-09-07T06:40:14.6387099Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T06:40:14.6412721Z Entering 'third_party/benchmark' 2025-09-07T06:40:14.6438942Z Entering 'third_party/composable_kernel' 2025-09-07T06:40:14.6470085Z Entering 'third_party/cpp-httplib' 2025-09-07T06:40:14.6490640Z Entering 'third_party/cpuinfo' 2025-09-07T06:40:14.6511877Z Entering 'third_party/cudnn_frontend' 2025-09-07T06:40:14.6534605Z Entering 'third_party/cutlass' 2025-09-07T06:40:14.6569610Z Entering 'third_party/fbgemm' 2025-09-07T06:40:14.6596767Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T06:40:14.6629385Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T06:40:14.6663065Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T06:40:14.6688441Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T06:40:14.6722990Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T06:40:14.6744715Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T06:40:14.6773722Z Entering 'third_party/fbgemm/external/json' 2025-09-07T06:40:14.6807430Z Entering 'third_party/flash-attention' 2025-09-07T06:40:14.6834456Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T06:40:14.6857390Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T06:40:14.6884299Z Entering 'third_party/flatbuffers' 2025-09-07T06:40:14.6905696Z Entering 'third_party/fmt' 2025-09-07T06:40:14.6929550Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T06:40:14.6952913Z Entering 'third_party/gloo' 2025-09-07T06:40:14.6981688Z Entering 'third_party/googletest' 2025-09-07T06:40:14.7006157Z Entering 'third_party/ideep' 2025-09-07T06:40:14.7030461Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T06:40:14.7069801Z Entering 'third_party/ittapi' 2025-09-07T06:40:14.7093108Z Entering 'third_party/kineto' 2025-09-07T06:40:14.7117109Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T06:40:14.7143696Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T06:40:14.7176318Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T06:40:14.7201663Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T06:40:14.7232069Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T06:40:14.7257643Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T06:40:14.7286021Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T06:40:14.7307485Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T06:40:14.7327079Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T06:40:14.7353331Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T06:40:14.7377050Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T06:40:14.7406125Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T06:40:14.7435083Z Entering 'third_party/kleidiai' 2025-09-07T06:40:14.7459721Z Entering 'third_party/mimalloc' 2025-09-07T06:40:14.7484221Z Entering 'third_party/nlohmann' 2025-09-07T06:40:14.7515565Z Entering 'third_party/onnx' 2025-09-07T06:40:14.7548480Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T06:40:14.7577584Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T06:40:14.7603492Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T06:40:14.7625975Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T06:40:14.7648662Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T06:40:14.7677467Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T06:40:14.7700849Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T06:40:14.7735410Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T06:40:14.7759983Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T06:40:14.7789106Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T06:40:14.7813752Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T06:40:14.7837064Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T06:40:14.7870784Z Entering 'third_party/pocketfft' 2025-09-07T06:40:14.7898799Z Entering 'third_party/protobuf' 2025-09-07T06:40:14.7924048Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T06:40:14.7943102Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T06:40:14.7966926Z Entering 'third_party/psimd' 2025-09-07T06:40:14.7992420Z Entering 'third_party/pthreadpool' 2025-09-07T06:40:14.8019472Z Entering 'third_party/pybind11' 2025-09-07T06:40:14.8042379Z Entering 'third_party/python-peachpy' 2025-09-07T06:40:14.8077612Z Entering 'third_party/sleef' 2025-09-07T06:40:14.8100611Z Entering 'third_party/tensorpipe' 2025-09-07T06:40:14.8123771Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T06:40:14.8146409Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T06:40:14.8172044Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T06:40:14.8189616Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T06:40:14.8209752Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T06:40:14.8242715Z ##[endgroup] 2025-09-07T06:40:14.8270965Z [command]/usr/bin/git log -1 --format=%H 2025-09-07T06:40:14.8288792Z 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:40:14.8381751Z ##[group]Run cd "${GITHUB_WORKSPACE}" 2025-09-07T06:40:14.8382086Z cd "${GITHUB_WORKSPACE}" 2025-09-07T06:40:14.8382206Z # Clean stale submodule dirs 2025-09-07T06:40:14.8382340Z if [ -z "${NO_SUDO}" ]; then 2025-09-07T06:40:14.8382483Z  sudo git submodule foreach --recursive git clean -ffdx 2025-09-07T06:40:14.8382627Z else 2025-09-07T06:40:14.8382748Z  git submodule foreach --recursive git clean -ffdx 2025-09-07T06:40:14.8382884Z fi 2025-09-07T06:40:14.8391305Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:14.8391440Z env: 2025-09-07T06:40:14.8391524Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:14.8391616Z NO_SUDO: true 2025-09-07T06:40:14.8391852Z ##[endgroup] 2025-09-07T06:40:14.8562139Z Entering 'android/libs/fbjni' 2025-09-07T06:40:14.8578802Z Entering 'third_party/FP16' 2025-09-07T06:40:14.8595822Z Entering 'third_party/FXdiv' 2025-09-07T06:40:14.8614785Z Entering 'third_party/NNPACK' 2025-09-07T06:40:14.8636174Z Entering 'third_party/NVTX' 2025-09-07T06:40:14.8657114Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T06:40:14.8686691Z Entering 'third_party/XNNPACK' 2025-09-07T06:40:14.8764881Z Entering 'third_party/aiter' 2025-09-07T06:40:14.8788573Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T06:40:14.8854221Z Entering 'third_party/benchmark' 2025-09-07T06:40:14.8873039Z Entering 'third_party/composable_kernel' 2025-09-07T06:40:14.8952414Z Entering 'third_party/cpp-httplib' 2025-09-07T06:40:14.8972716Z Entering 'third_party/cpuinfo' 2025-09-07T06:40:14.8996111Z Entering 'third_party/cudnn_frontend' 2025-09-07T06:40:14.9019933Z Entering 'third_party/cutlass' 2025-09-07T06:40:14.9081514Z Entering 'third_party/fbgemm' 2025-09-07T06:40:14.9116310Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T06:40:14.9134124Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T06:40:14.9199175Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T06:40:14.9222416Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T06:40:14.9282968Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T06:40:14.9306271Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T06:40:14.9323302Z Entering 'third_party/fbgemm/external/json' 2025-09-07T06:40:14.9356670Z Entering 'third_party/flash-attention' 2025-09-07T06:40:14.9377834Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T06:40:14.9442592Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T06:40:14.9504447Z Entering 'third_party/flatbuffers' 2025-09-07T06:40:14.9544040Z Entering 'third_party/fmt' 2025-09-07T06:40:14.9563810Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T06:40:14.9581599Z Entering 'third_party/gloo' 2025-09-07T06:40:14.9604254Z Entering 'third_party/googletest' 2025-09-07T06:40:14.9627166Z Entering 'third_party/ideep' 2025-09-07T06:40:14.9644157Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T06:40:14.9721071Z Entering 'third_party/ittapi' 2025-09-07T06:40:14.9738714Z Entering 'third_party/kineto' 2025-09-07T06:40:14.9768012Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T06:40:14.9795437Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T06:40:14.9830494Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T06:40:14.9854698Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T06:40:14.9876876Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T06:40:14.9898756Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T06:40:14.9928750Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T06:40:14.9954177Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T06:40:14.9977306Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T06:40:15.0004153Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T06:40:15.0023629Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T06:40:15.0041597Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T06:40:15.0062168Z Entering 'third_party/kleidiai' 2025-09-07T06:40:15.0085142Z Entering 'third_party/mimalloc' 2025-09-07T06:40:15.0109248Z Entering 'third_party/nlohmann' 2025-09-07T06:40:15.0138763Z Entering 'third_party/onnx' 2025-09-07T06:40:15.0363587Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T06:40:15.0380518Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T06:40:15.0414704Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T06:40:15.0439730Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T06:40:15.0468300Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T06:40:15.0495643Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T06:40:15.0522180Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T06:40:15.0539794Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T06:40:15.0562321Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T06:40:15.0576630Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T06:40:15.0607052Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T06:40:15.0628113Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T06:40:15.1192211Z Entering 'third_party/pocketfft' 2025-09-07T06:40:15.1220197Z Entering 'third_party/protobuf' 2025-09-07T06:40:15.1286276Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T06:40:15.1313243Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T06:40:15.1337972Z Entering 'third_party/psimd' 2025-09-07T06:40:15.1368293Z Entering 'third_party/pthreadpool' 2025-09-07T06:40:15.1393421Z Entering 'third_party/pybind11' 2025-09-07T06:40:15.1421378Z Entering 'third_party/python-peachpy' 2025-09-07T06:40:15.1443600Z Entering 'third_party/sleef' 2025-09-07T06:40:15.1480406Z Entering 'third_party/tensorpipe' 2025-09-07T06:40:15.1502559Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T06:40:15.1525096Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T06:40:15.1548725Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T06:40:15.1577026Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T06:40:15.1598600Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T06:40:15.1711245Z Prepare all required actions 2025-09-07T06:40:15.1711504Z Getting action download info 2025-09-07T06:40:15.3709138Z ##[group]Run ./.github/actions/setup-rocm 2025-09-07T06:40:15.3709273Z env: 2025-09-07T06:40:15.3709364Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:15.3709463Z ##[endgroup] 2025-09-07T06:40:15.3731240Z ##[group]Run dpkg -l | grep -E " rocm" 2025-09-07T06:40:15.3731373Z dpkg -l | grep -E " rocm" 2025-09-07T06:40:15.3737430Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:15.3737562Z env: 2025-09-07T06:40:15.3737640Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:15.3737734Z ##[endgroup] 2025-09-07T06:40:15.3793740Z ii rocm-cmake 0.14.0.60401-83~22.04 amd64 rocm-cmake built using CMake 2025-09-07T06:40:15.3794061Z ii rocm-core 6.4.1.60401-83~22.04 amd64 ROCm Runtime software stack 2025-09-07T06:40:15.3794366Z ii rocm-dbgapi 0.77.2.60401-83~22.04 amd64 Library to provide AMD GPU debugger API 2025-09-07T06:40:15.3794631Z ii rocm-debug-agent 2.0.4.60401-83~22.04 amd64 Radeon Open Compute Debug Agent (ROCdebug-agent) 2025-09-07T06:40:15.3794884Z ii rocm-dev 6.4.1.60401-83~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-09-07T06:40:15.3795126Z ii rocm-device-libs 1.0.0.60401-83~22.04 amd64 Radeon Open Compute - device libraries 2025-09-07T06:40:15.3795339Z ii rocm-gdb 15.2.60401-83~22.04 amd64 ROCgdb 2025-09-07T06:40:15.3795537Z ii rocm-llvm 19.0.0.25184.60401-83~22.04 amd64 ROCm core compiler 2025-09-07T06:40:15.3795743Z ii rocm-opencl 2.0.0.60401-83~22.04 amd64 clr built using CMake 2025-09-07T06:40:15.3795957Z ii rocm-opencl-dev 2.0.0.60401-83~22.04 amd64 clr built using CMake 2025-09-07T06:40:15.3796377Z ii rocm-smi-lib 7.5.0.60401-83~22.04 amd64 AMD System Management libraries 2025-09-07T06:40:15.3796742Z ii rocm-utils 6.4.1.60401-83~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-09-07T06:40:15.3796988Z ii rocminfo 1.0.0.60401-83~22.04 amd64 Radeon Open Compute (ROCm) Runtime rocminfo tool 2025-09-07T06:40:15.3814039Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-09-07T06:40:15.3814265Z # ignore expansion of "docker ps -q" since it could be empty 2025-09-07T06:40:15.3814420Z # shellcheck disable=SC2046 2025-09-07T06:40:15.3814554Z docker stop $(docker ps -q) || true 2025-09-07T06:40:15.3814683Z # Prune all stopped containers. 2025-09-07T06:40:15.3814815Z docker container prune -f 2025-09-07T06:40:15.3820490Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:15.3820640Z env: 2025-09-07T06:40:15.3820726Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:15.3820831Z ##[endgroup] 2025-09-07T06:40:15.4070422Z docker: 'docker stop' requires at least 1 argument 2025-09-07T06:40:15.4070568Z 2025-09-07T06:40:15.4070638Z Usage: docker stop [OPTIONS] CONTAINER [CONTAINER...] 2025-09-07T06:40:15.4070739Z 2025-09-07T06:40:15.4070799Z See 'docker stop --help' for more information 2025-09-07T06:40:15.4191730Z Total reclaimed space: 0B 2025-09-07T06:40:15.4235556Z ##[group]Run cat /etc/os-release || true 2025-09-07T06:40:15.4235732Z cat /etc/os-release || true 2025-09-07T06:40:15.4235869Z cat /etc/apt/sources.list.d/rocm.list || true 2025-09-07T06:40:15.4236009Z cat /opt/rocm/.info/version || true 2025-09-07T06:40:15.4236122Z whoami 2025-09-07T06:40:15.4242039Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:15.4242181Z env: 2025-09-07T06:40:15.4242264Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:15.4242380Z ##[endgroup] 2025-09-07T06:40:15.4269838Z PRETTY_NAME="Ubuntu 22.04.5 LTS" 2025-09-07T06:40:15.4269978Z NAME="Ubuntu" 2025-09-07T06:40:15.4270391Z VERSION_ID="22.04" 2025-09-07T06:40:15.4270561Z VERSION="22.04.5 LTS (Jammy Jellyfish)" 2025-09-07T06:40:15.4270726Z VERSION_CODENAME=jammy 2025-09-07T06:40:15.4270847Z ID=ubuntu 2025-09-07T06:40:15.4270953Z ID_LIKE=debian 2025-09-07T06:40:15.4271094Z HOME_URL="https://www.ubuntu.com/" 2025-09-07T06:40:15.4271258Z SUPPORT_URL="https://help.ubuntu.com/" 2025-09-07T06:40:15.4271433Z BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" 2025-09-07T06:40:15.4271691Z PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" 2025-09-07T06:40:15.4271930Z UBUNTU_CODENAME=jammy 2025-09-07T06:40:15.4279711Z deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.4.1 jammy main 2025-09-07T06:40:15.4287004Z 6.4.1-83 2025-09-07T06:40:15.4293387Z runner 2025-09-07T06:40:15.4326015Z ##[group]Run dpkg -l | grep -E " amdgpu" 2025-09-07T06:40:15.4326216Z dpkg -l | grep -E " amdgpu" 2025-09-07T06:40:15.4332604Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:15.4334800Z env: 2025-09-07T06:40:15.4334887Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:15.4334985Z ##[endgroup] 2025-09-07T06:40:15.4683487Z ii amdgpu-core 1:6.4.60401-2164967.22.04 all Core meta package for unified amdgpu driver. 2025-09-07T06:40:15.4683769Z ii amdgpu-install 6.4.60401-2164967.22.04 all AMDGPU driver repository and installer 2025-09-07T06:40:15.4706927Z ##[group]Run rocm-smi 2025-09-07T06:40:15.4707131Z rocm-smi 2025-09-07T06:40:15.4714248Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:15.4714457Z env: 2025-09-07T06:40:15.4714584Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:15.4714764Z ##[endgroup] 2025-09-07T06:40:15.5198494Z 2025-09-07T06:40:15.5198776Z 2025-09-07T06:40:15.5199540Z ============================================ ROCm System Management Interface ============================================ 2025-09-07T06:40:15.5200135Z ====================================================== Concise Info ====================================================== 2025-09-07T06:40:15.5200721Z Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% 2025-09-07T06:40:15.5201809Z  (DID, GUID) (Junction) (Socket) (Mem, Compute, ID)  2025-09-07T06:40:15.5202310Z ========================================================================================================================== 2025-09-07T06:40:15.5203349Z 0 7 0x74b9, 26434 44.0°C 137.0W NPS1, SPX, 0 164Mhz 900Mhz 0% auto 1000.0W 0% 0% 2025-09-07T06:40:15.5203883Z ========================================================================================================================== 2025-09-07T06:40:15.5204309Z ================================================== End of ROCm SMI Log =================================================== 2025-09-07T06:40:15.5267254Z ##[group]Run rocminfo 2025-09-07T06:40:15.5267390Z rocminfo 2025-09-07T06:40:15.5272732Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:15.5272882Z env: 2025-09-07T06:40:15.5272969Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:15.5273066Z ##[endgroup] 2025-09-07T06:40:15.5939243Z ROCk module version 6.12.12 is loaded 2025-09-07T06:40:15.5939488Z ===================== 2025-09-07T06:40:15.5939600Z HSA System Attributes 2025-09-07T06:40:15.5939757Z ===================== 2025-09-07T06:40:15.5939857Z Runtime Version: 1.15 2025-09-07T06:40:15.5939979Z Runtime Ext Version: 1.7 2025-09-07T06:40:15.5940336Z System Timestamp Freq.: 1000.000000MHz 2025-09-07T06:40:15.5940521Z Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) 2025-09-07T06:40:15.5940737Z Machine Model: LARGE 2025-09-07T06:40:15.5940898Z System Endianness: LITTLE 2025-09-07T06:40:15.5941041Z Mwaitx: DISABLED 2025-09-07T06:40:15.5941153Z XNACK enabled: NO 2025-09-07T06:40:15.5941259Z DMAbuf Support: YES 2025-09-07T06:40:15.5941360Z VMM Support: YES 2025-09-07T06:40:15.5941502Z 2025-09-07T06:40:15.5941692Z ========== 2025-09-07T06:40:15.5941785Z HSA Agents 2025-09-07T06:40:15.5941879Z ========== 2025-09-07T06:40:15.5942038Z ******* 2025-09-07T06:40:15.5942130Z Agent 1 2025-09-07T06:40:15.5942327Z ******* 2025-09-07T06:40:15.5942443Z Name: AMD EPYC 9575F 64-Core Processor 2025-09-07T06:40:15.5942613Z Uuid: CPU-XX 2025-09-07T06:40:15.5942758Z Marketing Name: AMD EPYC 9575F 64-Core Processor 2025-09-07T06:40:15.5942915Z Vendor Name: CPU 2025-09-07T06:40:15.5943058Z Feature: None specified 2025-09-07T06:40:15.5943209Z Profile: FULL_PROFILE 2025-09-07T06:40:15.5943353Z Float Round Mode: NEAR 2025-09-07T06:40:15.5943496Z Max Queue Number: 0(0x0) 2025-09-07T06:40:15.5943639Z Queue Min Size: 0(0x0) 2025-09-07T06:40:15.5943778Z Queue Max Size: 0(0x0) 2025-09-07T06:40:15.5943918Z Queue Type: MULTI 2025-09-07T06:40:15.5944050Z Node: 0 2025-09-07T06:40:15.5944189Z Device Type: CPU 2025-09-07T06:40:15.5944320Z Cache Info: 2025-09-07T06:40:15.5944560Z L1: 65536(0x10000) KB 2025-09-07T06:40:15.5944691Z Chip ID: 0(0x0) 2025-09-07T06:40:15.5944824Z ASIC Revision: 0(0x0) 2025-09-07T06:40:15.5945004Z Cacheline Size: 64(0x40) 2025-09-07T06:40:15.5945147Z Max Clock Freq. (MHz): 0 2025-09-07T06:40:15.5945279Z BDFID: 0 2025-09-07T06:40:15.5945415Z Internal Node ID: 0 2025-09-07T06:40:15.5945557Z Compute Unit: 80 2025-09-07T06:40:15.5945691Z SIMDs per CU: 0 2025-09-07T06:40:15.5950658Z Shader Engines: 0 2025-09-07T06:40:15.5950815Z Shader Arrs. per Eng.: 0 2025-09-07T06:40:15.5950978Z WatchPts on Addr. Ranges:1 2025-09-07T06:40:15.5951117Z Memory Properties: 2025-09-07T06:40:15.5951216Z Features: None 2025-09-07T06:40:15.5951315Z Pool Info: 2025-09-07T06:40:15.5951404Z Pool 1 2025-09-07T06:40:15.5951533Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:40:15.5951676Z Size: 660245920(0x275a8da0) KB 2025-09-07T06:40:15.5951865Z Allocatable: TRUE 2025-09-07T06:40:15.5952004Z Alloc Granule: 4KB 2025-09-07T06:40:15.5952158Z Alloc Recommended Granule:4KB 2025-09-07T06:40:15.5952318Z Alloc Alignment: 4KB 2025-09-07T06:40:15.5952572Z Accessible by all: TRUE 2025-09-07T06:40:15.5952706Z Pool 2 2025-09-07T06:40:15.5952822Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:40:15.5952959Z Size: 660245920(0x275a8da0) KB 2025-09-07T06:40:15.5953089Z Allocatable: TRUE 2025-09-07T06:40:15.5953235Z Alloc Granule: 4KB 2025-09-07T06:40:15.5953380Z Alloc Recommended Granule:4KB 2025-09-07T06:40:15.5953523Z Alloc Alignment: 4KB 2025-09-07T06:40:15.5953665Z Accessible by all: TRUE 2025-09-07T06:40:15.5953788Z Pool 3 2025-09-07T06:40:15.5953901Z Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 2025-09-07T06:40:15.5954034Z Size: 660245920(0x275a8da0) KB 2025-09-07T06:40:15.5954164Z Allocatable: TRUE 2025-09-07T06:40:15.5954304Z Alloc Granule: 4KB 2025-09-07T06:40:15.5954447Z Alloc Recommended Granule:4KB 2025-09-07T06:40:15.5954591Z Alloc Alignment: 4KB 2025-09-07T06:40:15.5954731Z Accessible by all: TRUE 2025-09-07T06:40:15.5954853Z Pool 4 2025-09-07T06:40:15.5954963Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:40:15.5955096Z Size: 660245920(0x275a8da0) KB 2025-09-07T06:40:15.5955225Z Allocatable: TRUE 2025-09-07T06:40:15.5955365Z Alloc Granule: 4KB 2025-09-07T06:40:15.5955512Z Alloc Recommended Granule:4KB 2025-09-07T06:40:15.5955655Z Alloc Alignment: 4KB 2025-09-07T06:40:15.5955859Z Accessible by all: TRUE 2025-09-07T06:40:15.5955982Z ISA Info: 2025-09-07T06:40:15.5956070Z ******* 2025-09-07T06:40:15.5956155Z Agent 2 2025-09-07T06:40:15.5956241Z ******* 2025-09-07T06:40:15.5956343Z Name: AMD EPYC 9575F 64-Core Processor 2025-09-07T06:40:15.5956585Z Uuid: CPU-XX 2025-09-07T06:40:15.5956723Z Marketing Name: AMD EPYC 9575F 64-Core Processor 2025-09-07T06:40:15.5956866Z Vendor Name: CPU 2025-09-07T06:40:15.5957002Z Feature: None specified 2025-09-07T06:40:15.5957137Z Profile: FULL_PROFILE 2025-09-07T06:40:15.5957277Z Float Round Mode: NEAR 2025-09-07T06:40:15.5957417Z Max Queue Number: 0(0x0) 2025-09-07T06:40:15.5957552Z Queue Min Size: 0(0x0) 2025-09-07T06:40:15.5957684Z Queue Max Size: 0(0x0) 2025-09-07T06:40:15.5957900Z Queue Type: MULTI 2025-09-07T06:40:15.5958031Z Node: 1 2025-09-07T06:40:15.5958161Z Device Type: CPU 2025-09-07T06:40:15.5958284Z Cache Info: 2025-09-07T06:40:15.5958388Z L1: 65536(0x10000) KB 2025-09-07T06:40:15.5958515Z Chip ID: 0(0x0) 2025-09-07T06:40:15.5958646Z ASIC Revision: 0(0x0) 2025-09-07T06:40:15.5958868Z Cacheline Size: 64(0x40) 2025-09-07T06:40:15.5959007Z Max Clock Freq. (MHz): 0 2025-09-07T06:40:15.5959145Z BDFID: 0 2025-09-07T06:40:15.5959282Z Internal Node ID: 1 2025-09-07T06:40:15.5959421Z Compute Unit: 80 2025-09-07T06:40:15.5959559Z SIMDs per CU: 0 2025-09-07T06:40:15.5959699Z Shader Engines: 0 2025-09-07T06:40:15.5959843Z Shader Arrs. per Eng.: 0 2025-09-07T06:40:15.5959987Z WatchPts on Addr. Ranges:1 2025-09-07T06:40:15.5960116Z Memory Properties: 2025-09-07T06:40:15.5960213Z Features: None 2025-09-07T06:40:15.5960309Z Pool Info: 2025-09-07T06:40:15.5960403Z Pool 1 2025-09-07T06:40:15.5960522Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:40:15.5960661Z Size: 656328592(0x271ec790) KB 2025-09-07T06:40:15.5960797Z Allocatable: TRUE 2025-09-07T06:40:15.5960942Z Alloc Granule: 4KB 2025-09-07T06:40:15.5961091Z Alloc Recommended Granule:4KB 2025-09-07T06:40:15.5961269Z Alloc Alignment: 4KB 2025-09-07T06:40:15.5961414Z Accessible by all: TRUE 2025-09-07T06:40:15.5961540Z Pool 2 2025-09-07T06:40:15.5961655Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:40:15.5961792Z Size: 656328592(0x271ec790) KB 2025-09-07T06:40:15.5961923Z Allocatable: TRUE 2025-09-07T06:40:15.5962067Z Alloc Granule: 4KB 2025-09-07T06:40:15.5962215Z Alloc Recommended Granule:4KB 2025-09-07T06:40:15.5962424Z Alloc Alignment: 4KB 2025-09-07T06:40:15.5962569Z Accessible by all: TRUE 2025-09-07T06:40:15.5962693Z Pool 3 2025-09-07T06:40:15.5962810Z Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 2025-09-07T06:40:15.5962943Z Size: 656328592(0x271ec790) KB 2025-09-07T06:40:15.5963078Z Allocatable: TRUE 2025-09-07T06:40:15.5963219Z Alloc Granule: 4KB 2025-09-07T06:40:15.5963365Z Alloc Recommended Granule:4KB 2025-09-07T06:40:15.5963513Z Alloc Alignment: 4KB 2025-09-07T06:40:15.5963657Z Accessible by all: TRUE 2025-09-07T06:40:15.5963783Z Pool 4 2025-09-07T06:40:15.5966228Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:40:15.5966370Z Size: 656328592(0x271ec790) KB 2025-09-07T06:40:15.5966621Z Allocatable: TRUE 2025-09-07T06:40:15.5966760Z Alloc Granule: 4KB 2025-09-07T06:40:15.5966907Z Alloc Recommended Granule:4KB 2025-09-07T06:40:15.5967052Z Alloc Alignment: 4KB 2025-09-07T06:40:15.5967193Z Accessible by all: TRUE 2025-09-07T06:40:15.5967320Z ISA Info: 2025-09-07T06:40:15.5967411Z ******* 2025-09-07T06:40:15.5967500Z Agent 3 2025-09-07T06:40:15.5967587Z ******* 2025-09-07T06:40:15.5967756Z Name: gfx942 2025-09-07T06:40:15.5967890Z Uuid: GPU-d52b70587e52af6d 2025-09-07T06:40:15.5968040Z Marketing Name: AMD Instinct Mi325X VF 2025-09-07T06:40:15.5968190Z Vendor Name: AMD 2025-09-07T06:40:15.5968328Z Feature: KERNEL_DISPATCH 2025-09-07T06:40:15.5968464Z Profile: BASE_PROFILE 2025-09-07T06:40:15.5968601Z Float Round Mode: NEAR 2025-09-07T06:40:15.5968739Z Max Queue Number: 128(0x80) 2025-09-07T06:40:15.5968872Z Queue Min Size: 64(0x40) 2025-09-07T06:40:15.5970613Z Queue Max Size: 131072(0x20000) 2025-09-07T06:40:15.5970757Z Queue Type: MULTI 2025-09-07T06:40:15.5970888Z Node: 2 2025-09-07T06:40:15.5971017Z Device Type: GPU 2025-09-07T06:40:15.5971134Z Cache Info: 2025-09-07T06:40:15.5971235Z L1: 32(0x20) KB 2025-09-07T06:40:15.5971352Z L2: 4096(0x1000) KB 2025-09-07T06:40:15.5971468Z L3: 262144(0x40000) KB 2025-09-07T06:40:15.5971587Z Chip ID: 29881(0x74b9) 2025-09-07T06:40:15.5971720Z ASIC Revision: 1(0x1) 2025-09-07T06:40:15.5971858Z Cacheline Size: 128(0x80) 2025-09-07T06:40:15.5971996Z Max Clock Freq. (MHz): 2100 2025-09-07T06:40:15.5972126Z BDFID: 37632 2025-09-07T06:40:15.5973574Z Internal Node ID: 2 2025-09-07T06:40:15.5973711Z Compute Unit: 304 2025-09-07T06:40:15.5973910Z SIMDs per CU: 4 2025-09-07T06:40:15.5974045Z Shader Engines: 32 2025-09-07T06:40:15.5974185Z Shader Arrs. per Eng.: 1 2025-09-07T06:40:15.5974328Z WatchPts on Addr. Ranges:4 2025-09-07T06:40:15.5974473Z Coherent Host Access: FALSE 2025-09-07T06:40:15.5974598Z Memory Properties: 2025-09-07T06:40:15.5974702Z Features: KERNEL_DISPATCH 2025-09-07T06:40:15.5974830Z Fast F16 Operation: TRUE 2025-09-07T06:40:15.5974971Z Wavefront Size: 64(0x40) 2025-09-07T06:40:15.5975112Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:40:15.5976605Z Workgroup Max Size per Dimension: 2025-09-07T06:40:15.5976727Z x 1024(0x400) 2025-09-07T06:40:15.5976847Z y 1024(0x400) 2025-09-07T06:40:15.5976959Z z 1024(0x400) 2025-09-07T06:40:15.5977084Z Max Waves Per CU: 32(0x20) 2025-09-07T06:40:15.5977223Z Max Work-item Per CU: 2048(0x800) 2025-09-07T06:40:15.5977362Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:40:15.5977486Z Grid Max Size per Dimension: 2025-09-07T06:40:15.5977588Z x 4294967295(0xffffffff) 2025-09-07T06:40:15.5977702Z y 4294967295(0xffffffff) 2025-09-07T06:40:15.5977816Z z 4294967295(0xffffffff) 2025-09-07T06:40:15.5977947Z Max fbarriers/Workgrp: 32 2025-09-07T06:40:15.5980428Z Packet Processor uCode:: 177 2025-09-07T06:40:15.5982388Z SDMA engine uCode:: 24 2025-09-07T06:40:15.5982533Z IOMMU Support:: None 2025-09-07T06:40:15.5982655Z Pool Info: 2025-09-07T06:40:15.5982748Z Pool 1 2025-09-07T06:40:15.5982867Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:40:15.5983007Z Size: 268107776(0xffb0000) KB 2025-09-07T06:40:15.5983141Z Allocatable: TRUE 2025-09-07T06:40:15.5983281Z Alloc Granule: 4KB 2025-09-07T06:40:15.5983428Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:15.5983576Z Alloc Alignment: 4KB 2025-09-07T06:40:15.5983723Z Accessible by all: FALSE 2025-09-07T06:40:15.5983850Z Pool 2 2025-09-07T06:40:15.5985315Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:40:15.5985457Z Size: 268107776(0xffb0000) KB 2025-09-07T06:40:15.5985589Z Allocatable: TRUE 2025-09-07T06:40:15.5985726Z Alloc Granule: 4KB 2025-09-07T06:40:15.5985871Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:15.5986016Z Alloc Alignment: 4KB 2025-09-07T06:40:15.5986157Z Accessible by all: FALSE 2025-09-07T06:40:15.5986280Z Pool 3 2025-09-07T06:40:15.5986391Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:40:15.5986660Z Size: 268107776(0xffb0000) KB 2025-09-07T06:40:15.5986793Z Allocatable: TRUE 2025-09-07T06:40:15.5987006Z Alloc Granule: 4KB 2025-09-07T06:40:15.5988455Z Alloc Recommended Granule:2048KB 2025-09-07T06:40:15.5988610Z Alloc Alignment: 4KB 2025-09-07T06:40:15.5988752Z Accessible by all: FALSE 2025-09-07T06:40:15.5988874Z Pool 4 2025-09-07T06:40:15.5988982Z Segment: GROUP 2025-09-07T06:40:15.5989107Z Size: 64(0x40) KB 2025-09-07T06:40:15.5989238Z Allocatable: FALSE 2025-09-07T06:40:15.5989377Z Alloc Granule: 0KB 2025-09-07T06:40:15.5989519Z Alloc Recommended Granule:0KB 2025-09-07T06:40:15.5989668Z Alloc Alignment: 0KB 2025-09-07T06:40:15.5989814Z Accessible by all: FALSE 2025-09-07T06:40:15.5989938Z ISA Info: 2025-09-07T06:40:15.5990026Z ISA 1 2025-09-07T06:40:15.5991393Z Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- 2025-09-07T06:40:15.5991544Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-09-07T06:40:15.5991689Z Profiles: HSA_PROFILE_BASE 2025-09-07T06:40:15.5991833Z Default Rounding Mode: NEAR 2025-09-07T06:40:15.5991980Z Default Rounding Mode: NEAR 2025-09-07T06:40:15.5992116Z Fast f16: TRUE 2025-09-07T06:40:15.5992255Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:40:15.5992450Z Workgroup Max Size per Dimension: 2025-09-07T06:40:15.5992568Z x 1024(0x400) 2025-09-07T06:40:15.5992690Z y 1024(0x400) 2025-09-07T06:40:15.5992804Z z 1024(0x400) 2025-09-07T06:40:15.5993060Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:40:15.5995489Z Grid Max Size per Dimension: 2025-09-07T06:40:15.5995644Z x 4294967295(0xffffffff) 2025-09-07T06:40:15.5995765Z y 4294967295(0xffffffff) 2025-09-07T06:40:15.5995882Z z 4294967295(0xffffffff) 2025-09-07T06:40:15.5996013Z FBarrier Max Size: 32 2025-09-07T06:40:15.5996136Z ISA 2 2025-09-07T06:40:15.5996268Z Name: amdgcn-amd-amdhsa--gfx9-4-generic:sramecc+:xnack- 2025-09-07T06:40:15.5996430Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-09-07T06:40:15.5996748Z Profiles: HSA_PROFILE_BASE 2025-09-07T06:40:15.5996900Z Default Rounding Mode: NEAR 2025-09-07T06:40:15.5998792Z Default Rounding Mode: NEAR 2025-09-07T06:40:15.5998956Z Fast f16: TRUE 2025-09-07T06:40:15.5999103Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:40:15.5999234Z Workgroup Max Size per Dimension: 2025-09-07T06:40:15.5999348Z x 1024(0x400) 2025-09-07T06:40:15.5999463Z y 1024(0x400) 2025-09-07T06:40:15.5999574Z z 1024(0x400) 2025-09-07T06:40:15.5999703Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:40:15.5999829Z Grid Max Size per Dimension: 2025-09-07T06:40:15.6000029Z x 4294967295(0xffffffff) 2025-09-07T06:40:15.6000146Z y 4294967295(0xffffffff) 2025-09-07T06:40:15.6000260Z z 4294967295(0xffffffff) 2025-09-07T06:40:15.6001590Z FBarrier Max Size: 32 2025-09-07T06:40:15.6001716Z *** Done *** 2025-09-07T06:40:15.6017056Z ##[group]Run ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') 2025-09-07T06:40:15.6017235Z ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') 2025-09-07T06:40:15.6017500Z msg="Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" 2025-09-07T06:40:15.6017744Z if [[ $ngpu -eq 0 ]]; then 2025-09-07T06:40:15.6017885Z  echo "Error: Failed to detect any GPUs on the runner" 2025-09-07T06:40:15.6018015Z  echo "$msg" 2025-09-07T06:40:15.6018108Z  exit 1 2025-09-07T06:40:15.6018189Z fi 2025-09-07T06:40:15.6022736Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:15.6024735Z env: 2025-09-07T06:40:15.6024822Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:15.6024915Z ##[endgroup] 2025-09-07T06:40:15.6792404Z ##[group]Run pytorch/pytorch/.github/actions/diskspace-cleanup@main 2025-09-07T06:40:15.6792583Z with: 2025-09-07T06:40:15.6792684Z diskspace-cutoff: 70 2025-09-07T06:40:15.6792782Z env: 2025-09-07T06:40:15.6792879Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:15.6792984Z ##[endgroup] 2025-09-07T06:40:15.6817130Z ##[group]Run set -ex 2025-09-07T06:40:15.6817285Z set -ex 2025-09-07T06:40:15.6817389Z diskspace_cutoff=70 2025-09-07T06:40:15.6817532Z docker_root_dir=$(docker info -f '{{.DockerRootDir}}') 2025-09-07T06:40:15.6817871Z if [ ! -d "$docker_root_dir" ]; then 2025-09-07T06:40:15.6818066Z  echo "Docker root directory ($docker_root_dir) does not exist. Skipping disk space check." 2025-09-07T06:40:15.6818272Z  exit 0 2025-09-07T06:40:15.6818369Z fi 2025-09-07T06:40:15.6818528Z diskspace=$(df -H --output=pcent ${docker_root_dir} | sed -n 2p | sed 's/%//' | sed 's/ //') 2025-09-07T06:40:15.6818856Z msg="Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" 2025-09-07T06:40:15.6819130Z if [[ "$diskspace" -ge "$diskspace_cutoff" ]] ; then 2025-09-07T06:40:15.6819273Z  docker system prune -af 2025-09-07T06:40:15.6819459Z  diskspace_new=$(df -H --output=pcent ${docker_root_dir} | sed -n 2p | sed 's/%//' | sed 's/ //') 2025-09-07T06:40:15.6819665Z  if [[ "$diskspace_new" -gt "$diskspace_cutoff" ]] ; then 2025-09-07T06:40:15.6819894Z  echo "Error: Available diskspace is less than $diskspace_cutoff percent. Not enough diskspace." 2025-09-07T06:40:15.6826088Z  echo "$msg" 2025-09-07T06:40:15.6826204Z  exit 1 2025-09-07T06:40:15.6826299Z  else 2025-09-07T06:40:15.6826407Z  difference=$((diskspace - diskspace_new)) 2025-09-07T06:40:15.6826650Z  echo "Diskspace saved: $difference percent" 2025-09-07T06:40:15.6826771Z  fi 2025-09-07T06:40:15.6826853Z fi 2025-09-07T06:40:15.6832864Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:15.6832999Z env: 2025-09-07T06:40:15.6833081Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:15.6833176Z ##[endgroup] 2025-09-07T06:40:15.6858184Z + diskspace_cutoff=70 2025-09-07T06:40:15.6860239Z ++ docker info -f '{{.DockerRootDir}}' 2025-09-07T06:40:15.7230900Z + docker_root_dir=/home/runner/docker-data 2025-09-07T06:40:15.7239655Z + '[' '!' -d /home/runner/docker-data ']' 2025-09-07T06:40:15.7240428Z ++ df -H --output=pcent /home/runner/docker-data 2025-09-07T06:40:15.7240610Z ++ sed -n 2p 2025-09-07T06:40:15.7241014Z ++ sed s/%// 2025-09-07T06:40:15.7241102Z ++ sed 's/ //' 2025-09-07T06:40:15.7253329Z + diskspace=16 2025-09-07T06:40:15.7253608Z + msg='Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified' 2025-09-07T06:40:15.7253858Z + [[ 16 -ge 70 ]] 2025-09-07T06:40:15.7269746Z ##[group]Run RUNNER_ARTIFACT_DIR="${RUNNER_TEMP}/artifacts" 2025-09-07T06:40:15.7269939Z RUNNER_ARTIFACT_DIR="${RUNNER_TEMP}/artifacts" 2025-09-07T06:40:15.7270082Z rm -rf "${RUNNER_ARTIFACT_DIR}" 2025-09-07T06:40:15.7270216Z mkdir -p "${RUNNER_ARTIFACT_DIR}" 2025-09-07T06:40:15.7270394Z echo "RUNNER_ARTIFACT_DIR=${RUNNER_ARTIFACT_DIR}" >> "${GITHUB_ENV}" 2025-09-07T06:40:15.7270545Z  2025-09-07T06:40:15.7274833Z RUNNER_TEST_RESULTS_DIR="${RUNNER_TEMP}/test-results" 2025-09-07T06:40:15.7275027Z rm -rf "${RUNNER_TEST_RESULTS_DIR}" 2025-09-07T06:40:15.7275165Z mkdir -p "${RUNNER_TEST_RESULTS_DIR}" 2025-09-07T06:40:15.7275364Z echo "RUNNER_TEST_RESULTS_DIR=${RUNNER_TEST_RESULTS_DIR}" >> "${GITHUB_ENV}" 2025-09-07T06:40:15.7275536Z  2025-09-07T06:40:15.7275630Z RUNNER_DOCS_DIR="${RUNNER_TEMP}/docs" 2025-09-07T06:40:15.7275760Z rm -rf "${RUNNER_DOCS_DIR}" 2025-09-07T06:40:15.7275871Z mkdir -p "${RUNNER_DOCS_DIR}" 2025-09-07T06:40:15.7276025Z echo "RUNNER_DOCS_DIR=${RUNNER_DOCS_DIR}" >> "${GITHUB_ENV}" 2025-09-07T06:40:15.7280490Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:15.7280629Z env: 2025-09-07T06:40:15.7280710Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:15.7280806Z ##[endgroup] 2025-09-07T06:40:15.7348473Z ##[group]Run env | grep '^GITHUB' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-09-07T06:40:15.7348792Z env | grep '^GITHUB' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-09-07T06:40:15.7350900Z env | grep '^CI' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-09-07T06:40:15.7354943Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:15.7355073Z env: 2025-09-07T06:40:15.7355152Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:15.7355271Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:40:15.7355430Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:40:15.7355581Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:40:15.7355697Z ##[endgroup] 2025-09-07T06:40:15.7412411Z ##[group]Run # All GPUs are visible to the runner; visibility, if needed, will be set by run_test.py. 2025-09-07T06:40:15.7412735Z # All GPUs are visible to the runner; visibility, if needed, will be set by run_test.py. 2025-09-07T06:40:15.7412938Z # Add render group for container creation. 2025-09-07T06:40:15.7413120Z render_gid=`cat /etc/group | grep render | cut -d: -f3` 2025-09-07T06:40:15.7413324Z # Ensure GPU isolation if pod is part of kubernetes setup with DEVICE_FLAG. 2025-09-07T06:40:15.7413540Z if [ -f "/etc/podinfo/gha-render-devices" ]; then 2025-09-07T06:40:15.7413708Z  DEVICE_FLAG=$(cat /etc/podinfo/gha-render-devices) 2025-09-07T06:40:15.7413855Z else 2025-09-07T06:40:15.7413962Z  DEVICE_FLAG="--device /dev/dri" 2025-09-07T06:40:15.7414085Z fi 2025-09-07T06:40:15.7414264Z # The --group-add daemon and --group-add bin are needed in the Ubuntu 24.04 and Almalinux OSs respectively. 2025-09-07T06:40:15.7414561Z # This is due to the device files (/dev/kfd & /dev/dri) being owned by video group on bare metal. 2025-09-07T06:40:15.7414814Z # This video group ID maps to subgid 1 inside the docker image due to the /etc/subgid entries. 2025-09-07T06:40:15.7415075Z # The group name corresponding to group ID 1 can change depending on the OS, so both are necessary. 2025-09-07T06:40:15.7415515Z echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd $DEVICE_FLAG --group-add video --group-add $render_gid --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host" >> "${GITHUB_ENV}" 2025-09-07T06:40:15.7421793Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:15.7421934Z env: 2025-09-07T06:40:15.7422028Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:15.7422159Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:40:15.7422333Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:40:15.7422512Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:40:15.7422644Z ##[endgroup] 2025-09-07T06:40:15.7531036Z ##[group]Run aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722 2025-09-07T06:40:15.7531236Z with: 2025-09-07T06:40:15.7531369Z role-to-assume: arn:aws:iam::308535385114:role/gha_workflow_s3_and_ecr_read_only 2025-09-07T06:40:15.7531527Z aws-region: us-east-1 2025-09-07T06:40:15.7531640Z role-duration-seconds: 18000 2025-09-07T06:40:15.7531749Z audience: sts.amazonaws.com 2025-09-07T06:40:15.7531842Z env: 2025-09-07T06:40:15.7531921Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:15.7534352Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:40:15.7534525Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:40:15.7534674Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:40:15.7535052Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:40:15.7535404Z ##[endgroup] 2025-09-07T06:40:15.9309108Z Assuming role with OIDC 2025-09-07T06:40:16.0502871Z Authenticated as assumedRoleId AROAUPVRELQNLLCOPFEJR:GitHubActions 2025-09-07T06:40:16.1060849Z ##[group]Run aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076 2025-09-07T06:40:16.1061050Z with: 2025-09-07T06:40:16.1061140Z mask-password: true 2025-09-07T06:40:16.1061236Z registry-type: private 2025-09-07T06:40:16.1061333Z skip-logout: false 2025-09-07T06:40:16.1061420Z env: 2025-09-07T06:40:16.1061501Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:16.1061627Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:40:16.1061791Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:40:16.1061943Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:40:16.1062318Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:40:16.1062674Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:40:16.1062792Z AWS_REGION: us-east-1 2025-09-07T06:40:16.1063330Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:40:16.1063483Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:40:16.1065511Z AWS_SESSION_TOKEN: *** 2025-09-07T06:40:16.1065606Z ##[endgroup] 2025-09-07T06:40:16.3022357Z Logging into registry 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:40:16.4937419Z ##[group]Run pytorch/test-infra/.github/actions/calculate-docker-image@main 2025-09-07T06:40:16.4937606Z with: 2025-09-07T06:40:16.4937873Z docker-image-name: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:40:16.4938166Z use-custom-docker-registry: true 2025-09-07T06:40:16.4938289Z docker-build-dir: .ci/docker 2025-09-07T06:40:16.4938407Z docker-build-script: ./build.sh 2025-09-07T06:40:16.4938524Z working-directory: . 2025-09-07T06:40:16.4938659Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:40:16.4938808Z force-push: false 2025-09-07T06:40:16.4938894Z env: 2025-09-07T06:40:16.4938978Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:16.4939110Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:40:16.4939449Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:40:16.4939620Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:40:16.4939997Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:40:16.4940357Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:40:16.4940465Z AWS_REGION: us-east-1 2025-09-07T06:40:16.4940721Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:40:16.4940873Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:40:16.4942940Z AWS_SESSION_TOKEN: *** 2025-09-07T06:40:16.4943040Z ##[endgroup] 2025-09-07T06:40:16.4957676Z ##[group]Run set -ex 2025-09-07T06:40:16.4957875Z set -ex 2025-09-07T06:40:16.4957959Z  2025-09-07T06:40:16.4958107Z # If the docker build directory or the build script doesn't exist, the action will 2025-09-07T06:40:16.4958349Z # gracefully return the docker image name as it is. Pulling docker image in Linux 2025-09-07T06:40:16.4958552Z # job could then download the pre-built image as usual 2025-09-07T06:40:16.4961609Z if [[ -d "${DOCKER_BUILD_DIR}" ]] && [[ -f "${DOCKER_BUILD_DIR}/${DOCKER_BUILD_SCRIPT}" ]] && [[ "${USE_CUSTOM_DOCKER_REGISTRY}" == "true" ]]; then 2025-09-07T06:40:16.4961857Z  echo "skip=false" >> "${GITHUB_OUTPUT}" 2025-09-07T06:40:16.4961978Z else 2025-09-07T06:40:16.4962077Z  echo "skip=true" >> "${GITHUB_OUTPUT}" 2025-09-07T06:40:16.4962240Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-09-07T06:40:16.4962385Z  2025-09-07T06:40:16.4962583Z  echo "Not using custom ECR registry. Either it was not requested or there is no Docker build script in the ${REPO_NAME} repo..." 2025-09-07T06:40:16.4962804Z  exit 0 2025-09-07T06:40:16.4962892Z fi 2025-09-07T06:40:16.4962975Z  2025-09-07T06:40:16.4963102Z if [[ "${DOCKER_IMAGE_NAME}" == *"${DOCKER_REGISTRY}/${REPO_NAME}"* ]]; then 2025-09-07T06:40:16.4963319Z  # The docker image name already includes the ECR prefix and tag, so we can just 2025-09-07T06:40:16.4963509Z  # use it as it is, but first let's extract the tag 2025-09-07T06:40:16.4965711Z  DOCKER_TAG=$(echo "${DOCKER_IMAGE_NAME}" | awk -F '[:,]' '{print $2}') 2025-09-07T06:40:16.4965895Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-09-07T06:40:16.4966069Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-09-07T06:40:16.4966212Z else 2025-09-07T06:40:16.4966317Z  if [[ "${DOCKER_IMAGE_NAME}" == *:* ]]; then 2025-09-07T06:40:16.4966460Z  CUSTOM_TAG_PREFIX=${DOCKER_IMAGE_NAME#*:} 2025-09-07T06:40:16.4966696Z  DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME%%:*} 2025-09-07T06:40:16.4966821Z  fi 2025-09-07T06:40:16.4967118Z  DOCKER_TAG=${CUSTOM_TAG_PREFIX:+${CUSTOM_TAG_PREFIX}-}$(git rev-parse HEAD:"${DOCKER_BUILD_DIR}") 2025-09-07T06:40:16.4967334Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-09-07T06:40:16.4969458Z  echo "docker-image=${DOCKER_REGISTRY}/${REPO_NAME}/${DOCKER_IMAGE_NAME}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-09-07T06:40:16.4969716Z  echo "custom-tag-prefix=${CUSTOM_TAG_PREFIX}" >> "${GITHUB_OUTPUT}" 2025-09-07T06:40:16.4969868Z fi 2025-09-07T06:40:16.4975442Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:16.4975577Z env: 2025-09-07T06:40:16.4975665Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:16.4975798Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:40:16.4975966Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:40:16.4976123Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:40:16.4976674Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:40:16.4979177Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:40:16.4979296Z AWS_REGION: us-east-1 2025-09-07T06:40:16.4979432Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:40:16.4979578Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:40:16.4981734Z AWS_SESSION_TOKEN: *** 2025-09-07T06:40:16.4981832Z REPO_NAME: pytorch 2025-09-07T06:40:16.4982100Z DOCKER_IMAGE_NAME: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:40:16.4982383Z DOCKER_BUILD_DIR: .ci/docker 2025-09-07T06:40:16.4982495Z DOCKER_BUILD_SCRIPT: ./build.sh 2025-09-07T06:40:16.4982637Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:40:16.4982787Z USE_CUSTOM_DOCKER_REGISTRY: true 2025-09-07T06:40:16.4984811Z CUSTOM_TAG_PREFIX: 2025-09-07T06:40:16.4984907Z ##[endgroup] 2025-09-07T06:40:16.5004787Z + [[ -d .ci/docker ]] 2025-09-07T06:40:16.5004990Z + [[ -f .ci/docker/./build.sh ]] 2025-09-07T06:40:16.5005130Z + [[ true == \t\r\u\e ]] 2025-09-07T06:40:16.5005253Z + echo skip=false 2025-09-07T06:40:16.5005642Z + [[ 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 == *\3\0\8\5\3\5\3\8\5\1\1\4\.\d\k\r\.\e\c\r\.\u\s\-\e\a\s\t\-\1\.\a\m\a\z\o\n\a\w\s\.\c\o\m\/\p\y\t\o\r\c\h* ]] 2025-09-07T06:40:16.5009538Z ++ echo 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:40:16.5010652Z ++ awk -F '[:,]' '{print $2}' 2025-09-07T06:40:16.5017548Z + DOCKER_TAG=pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:40:16.5017922Z + echo docker-tag=pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:40:16.5018478Z + echo docker-image=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:40:16.5046590Z ##[group]Run set +e 2025-09-07T06:40:16.5046717Z set +e 2025-09-07T06:40:16.5046807Z set -x 2025-09-07T06:40:16.5048943Z  2025-09-07T06:40:16.5049029Z login() { 2025-09-07T06:40:16.5049219Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-09-07T06:40:16.5049409Z } 2025-09-07T06:40:16.5049490Z  2025-09-07T06:40:16.5049571Z retry () { 2025-09-07T06:40:16.5049681Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-09-07T06:40:16.5049799Z } 2025-09-07T06:40:16.5049877Z  2025-09-07T06:40:16.5049965Z retry login "${DOCKER_REGISTRY}" 2025-09-07T06:40:16.5050079Z  2025-09-07T06:40:16.5050178Z START_TIME=$(date +%s) 2025-09-07T06:40:16.5050293Z # Wait up to 120 minutes 2025-09-07T06:40:16.5050559Z while [[ $(( $(date +%s) - 7200 )) -lt $START_TIME ]]; do 2025-09-07T06:40:16.5050753Z  # Check if image already exists, if it does then skip building it 2025-09-07T06:40:16.5050936Z  if docker manifest inspect "${DOCKER_IMAGE}"; then 2025-09-07T06:40:16.5051078Z  exit 0 2025-09-07T06:40:16.5051180Z  fi 2025-09-07T06:40:16.5051266Z  2025-09-07T06:40:16.5051414Z  # NB: This flag is used by Docker build workflow to push the image to ECR, so we can 2025-09-07T06:40:16.5051649Z  # use this to differentiate between the Docker build and regular build jobs. For the 2025-09-07T06:40:16.5051884Z  # latter, it will wait for the Docker images to become available before continuing 2025-09-07T06:40:16.5053807Z  if [ "${DOCKER_PUSH:-false}" == "true" ]; then 2025-09-07T06:40:16.5054036Z  # It's a Docker build job, let's build the image 2025-09-07T06:40:16.5054163Z  break 2025-09-07T06:40:16.5054259Z  else 2025-09-07T06:40:16.5054388Z  # It's a regular build job, wait for the image to become available 2025-09-07T06:40:16.5054537Z  sleep 300 2025-09-07T06:40:16.5054631Z  fi 2025-09-07T06:40:16.5054712Z done 2025-09-07T06:40:16.5054794Z  2025-09-07T06:40:16.5054924Z # NB: This part requires a full checkout. Otherwise, the merge base will 2025-09-07T06:40:16.5055124Z # be empty. The default action would be to continue rebuild the image 2025-09-07T06:40:16.5057015Z if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then 2025-09-07T06:40:16.5057185Z  # if we're on the base branch then use the parent commit 2025-09-07T06:40:16.5057333Z  MERGE_BASE=$(git rev-parse HEAD~) 2025-09-07T06:40:16.5057446Z else 2025-09-07T06:40:16.5057573Z  # otherwise we're on a PR, so use the most recent base commit 2025-09-07T06:40:16.5057749Z  MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") 2025-09-07T06:40:16.5057880Z fi 2025-09-07T06:40:16.5057959Z  2025-09-07T06:40:16.5058049Z if [[ -z "${MERGE_BASE}" ]]; then 2025-09-07T06:40:16.5058185Z  echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-09-07T06:40:16.5058305Z  2025-09-07T06:40:16.5060017Z  echo "Finding merge base only works with full checkout, please set fetch-depth to 0, continuing ..." 2025-09-07T06:40:16.5060210Z  exit 0 2025-09-07T06:40:16.5060297Z fi 2025-09-07T06:40:16.5060376Z  2025-09-07T06:40:16.5060491Z if ! git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}"; then 2025-09-07T06:40:16.5060731Z  echo "Directory '${DOCKER_BUILD_DIR}' not found in commit $MERGE_BASE, you should rebase onto a more recent commit" 2025-09-07T06:40:16.5060941Z  exit 1 2025-09-07T06:40:16.5061024Z fi 2025-09-07T06:40:16.5061102Z  2025-09-07T06:40:16.5061241Z PREVIOUS_DOCKER_TAG=$(git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}") 2025-09-07T06:40:16.5061474Z # If no image exists but the hash is the same as the previous hash then we should error out here 2025-09-07T06:40:16.5063199Z if [[ "${PREVIOUS_DOCKER_TAG}" == "${DOCKER_TAG}" ]]; then 2025-09-07T06:40:16.5063437Z  echo "WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch" 2025-09-07T06:40:16.5063702Z  echo " Will re-build docker image to store in local cache, TTS may be longer" 2025-09-07T06:40:16.5063863Z fi 2025-09-07T06:40:16.5063942Z  2025-09-07T06:40:16.5064041Z echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-09-07T06:40:16.5068026Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:16.5068170Z env: 2025-09-07T06:40:16.5068257Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:16.5068387Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:40:16.5070447Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:40:16.5070610Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:40:16.5070981Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:40:16.5071343Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:40:16.5071450Z AWS_REGION: us-east-1 2025-09-07T06:40:16.5071590Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:40:16.5071753Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:40:16.5073851Z AWS_SESSION_TOKEN: *** 2025-09-07T06:40:16.5073954Z DOCKER_BUILD_DIR: .ci/docker 2025-09-07T06:40:16.5074088Z BASE_REVISION: 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:40:16.5076138Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:40:16.5076581Z DOCKER_TAG: pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:40:16.5076803Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:40:16.5076949Z DOCKER_PUSH: 2025-09-07T06:40:16.5077038Z ##[endgroup] 2025-09-07T06:40:16.5093811Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:40:16.5094014Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:40:16.5096277Z + aws ecr get-login-password --region us-east-1 2025-09-07T06:40:16.5096912Z /home/runner/_work/_temp/913d6ba6-184e-44ec-9c19-6d13f5435dda.sh: line 5: aws: command not found 2025-09-07T06:40:16.5097888Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:40:16.5213509Z Error: Cannot perform an interactive login from a non TTY device 2025-09-07T06:40:16.5224023Z + sleep 1 2025-09-07T06:40:17.5235780Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:40:17.5237475Z + aws ecr get-login-password --region us-east-1 2025-09-07T06:40:17.5238268Z /home/runner/_work/_temp/913d6ba6-184e-44ec-9c19-6d13f5435dda.sh: line 5: aws: command not found 2025-09-07T06:40:17.5246456Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:40:17.5336609Z Error: Cannot perform an interactive login from a non TTY device 2025-09-07T06:40:17.5349016Z + sleep 2 2025-09-07T06:40:19.5358241Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:40:19.5361033Z + aws ecr get-login-password --region us-east-1 2025-09-07T06:40:19.5361688Z /home/runner/_work/_temp/913d6ba6-184e-44ec-9c19-6d13f5435dda.sh: line 5: aws: command not found 2025-09-07T06:40:19.5362427Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:40:19.5472682Z Error: Cannot perform an interactive login from a non TTY device 2025-09-07T06:40:19.5487380Z ++ date +%s 2025-09-07T06:40:19.5507795Z + START_TIME=1757227219 2025-09-07T06:40:19.5509061Z ++ date +%s 2025-09-07T06:40:19.5518128Z + [[ 1757220019 -lt 1757227219 ]] 2025-09-07T06:40:19.5518790Z + docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:40:20.0738495Z { 2025-09-07T06:40:20.0738709Z "schemaVersion": 2, 2025-09-07T06:40:20.0739021Z "mediaType": "application/vnd.docker.distribution.manifest.v2+json", 2025-09-07T06:40:20.0739305Z "config": { 2025-09-07T06:40:20.0739516Z "mediaType": "application/vnd.docker.container.image.v1+json", 2025-09-07T06:40:20.0739756Z "size": 28673, 2025-09-07T06:40:20.0739995Z "digest": "sha256:75a9a3098f66b0be74794dd2bc3dbb7161d42e50706a0abd073b4e2e9b01a0df" 2025-09-07T06:40:20.0740269Z }, 2025-09-07T06:40:20.0740382Z "layers": [ 2025-09-07T06:40:20.0740534Z { 2025-09-07T06:40:20.0740733Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0740969Z "size": 30592514, 2025-09-07T06:40:20.0741529Z "digest": "sha256:de66fc90c55d156d6760975acf0904d151017e48c9cfc68beedb51af31dc792e" 2025-09-07T06:40:20.0741806Z }, 2025-09-07T06:40:20.0741919Z { 2025-09-07T06:40:20.0750051Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0750320Z "size": 1554, 2025-09-07T06:40:20.0750575Z "digest": "sha256:efc45b9044a6cbae9d1981fa8f749b3b24e14bf1e2227b92e3e19d9f6f73f452" 2025-09-07T06:40:20.0750838Z }, 2025-09-07T06:40:20.0750950Z { 2025-09-07T06:40:20.0751142Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0751364Z "size": 335761518, 2025-09-07T06:40:20.0751612Z "digest": "sha256:06ce422a41d1c7cf432f8974e1d58813ddd8b819e07f30b0fb9e4b60a59cae0f" 2025-09-07T06:40:20.0751872Z }, 2025-09-07T06:40:20.0751977Z { 2025-09-07T06:40:20.0752375Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0752589Z "size": 703, 2025-09-07T06:40:20.0752809Z "digest": "sha256:673cf5ffa968806cdb68202cfe5926a9aec2cf5d3767ae0ff0da0ec13944178b" 2025-09-07T06:40:20.0753005Z }, 2025-09-07T06:40:20.0753084Z { 2025-09-07T06:40:20.0753221Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0753395Z "size": 1767, 2025-09-07T06:40:20.0753572Z "digest": "sha256:3042b077c06a48f78067f51e7ff8452d751af6ee0fbed1b4b316f96cc5e57e43" 2025-09-07T06:40:20.0753761Z }, 2025-09-07T06:40:20.0753902Z { 2025-09-07T06:40:20.0754037Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0754201Z "size": 486, 2025-09-07T06:40:20.0754376Z "digest": "sha256:ed25a020f194dda6e6ab0877fd48493d87d9f9c32f4080506829d4e1466654da" 2025-09-07T06:40:20.0754566Z }, 2025-09-07T06:40:20.0754649Z { 2025-09-07T06:40:20.0754787Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0754963Z "size": 120654516, 2025-09-07T06:40:20.0755150Z "digest": "sha256:a5876169851fc36e12eee569ea9b8bc8148ca43a0154e2890abf9e7b4313d42f" 2025-09-07T06:40:20.0755337Z }, 2025-09-07T06:40:20.0755420Z { 2025-09-07T06:40:20.0755555Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0755723Z "size": 4211, 2025-09-07T06:40:20.0755898Z "digest": "sha256:4971bfcf31c16df24b1203a98f4441cffb29a6802fd2e2e72524ad3c72648257" 2025-09-07T06:40:20.0756090Z }, 2025-09-07T06:40:20.0758345Z { 2025-09-07T06:40:20.0758489Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0758673Z "size": 1709, 2025-09-07T06:40:20.0758846Z "digest": "sha256:4d141034e9db8b1efd107fd8b817c312ebbfb12750bb3d105c969fc395cdb30f" 2025-09-07T06:40:20.0759035Z }, 2025-09-07T06:40:20.0759115Z { 2025-09-07T06:40:20.0759247Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0759418Z "size": 724, 2025-09-07T06:40:20.0759596Z "digest": "sha256:11edb6ea0bca3be307ef836b0bd07999ff562bcb7a807f5e6c9f7d4d5f976b5d" 2025-09-07T06:40:20.0759892Z }, 2025-09-07T06:40:20.0762569Z { 2025-09-07T06:40:20.0762735Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0763011Z "size": 3413656367, 2025-09-07T06:40:20.0763169Z "digest": "sha256:dc4852f72739939e80f47cc3e9ca55450851f269cad1f92f0727db6034824034" 2025-09-07T06:40:20.0763329Z }, 2025-09-07T06:40:20.0763396Z { 2025-09-07T06:40:20.0763512Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0763656Z "size": 381, 2025-09-07T06:40:20.0763800Z "digest": "sha256:829c85269cfc15438c511c0c1653d636a5028595b003f37e6a1a7f7bc8a41e13" 2025-09-07T06:40:20.0763965Z }, 2025-09-07T06:40:20.0765484Z { 2025-09-07T06:40:20.0765612Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0765767Z "size": 65173690, 2025-09-07T06:40:20.0765946Z "digest": "sha256:77ab4e659dd80460fd49a6261b6f368f2c70f74564b0f42dda6754c258191401" 2025-09-07T06:40:20.0766107Z }, 2025-09-07T06:40:20.0766260Z { 2025-09-07T06:40:20.0766378Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0766587Z "size": 792, 2025-09-07T06:40:20.0766729Z "digest": "sha256:c0da146487b65750b761c379246382215960693f02f4d35da4123d108fa13e2c" 2025-09-07T06:40:20.0766889Z }, 2025-09-07T06:40:20.0768210Z { 2025-09-07T06:40:20.0768335Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0768477Z "size": 106, 2025-09-07T06:40:20.0768626Z "digest": "sha256:a61c8111f4664262286755a8d5cfbae93144f18983033686df72956f655fd8da" 2025-09-07T06:40:20.0768782Z }, 2025-09-07T06:40:20.0768850Z { 2025-09-07T06:40:20.0768969Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0769110Z "size": 1495, 2025-09-07T06:40:20.0769324Z "digest": "sha256:720fb67e397fe91e4223cbbcd9dc794509a831119befac538606189a03cfec2a" 2025-09-07T06:40:20.0769485Z }, 2025-09-07T06:40:20.0769554Z { 2025-09-07T06:40:20.0770749Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0770899Z "size": 544075804, 2025-09-07T06:40:20.0771053Z "digest": "sha256:fa7524284edda12fb597eff06ec9d91998a88c828ea68d17de3acd97c5c013a2" 2025-09-07T06:40:20.0771215Z }, 2025-09-07T06:40:20.0771282Z { 2025-09-07T06:40:20.0771395Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0771536Z "size": 163, 2025-09-07T06:40:20.0771683Z "digest": "sha256:57cfc9fee363cabe5cb12ea4bf911816673ac774454cab777c9c56de243e5d11" 2025-09-07T06:40:20.0771846Z }, 2025-09-07T06:40:20.0771914Z { 2025-09-07T06:40:20.0772027Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0773148Z "size": 2484, 2025-09-07T06:40:20.0773305Z "digest": "sha256:c9e2c9bad36d9f9352a75c2fbfe035a0afc8dfdccd0b3e90b24416c6ba2a7752" 2025-09-07T06:40:20.0773475Z }, 2025-09-07T06:40:20.0773542Z { 2025-09-07T06:40:20.0773660Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0773804Z "size": 8101153352, 2025-09-07T06:40:20.0773961Z "digest": "sha256:8257abce8e9b64cd390c42a72eadc914605f609b05bd66b7bdd8dcd3c69762e6" 2025-09-07T06:40:20.0774124Z }, 2025-09-07T06:40:20.0774191Z { 2025-09-07T06:40:20.0774335Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0774480Z "size": 105, 2025-09-07T06:40:20.0775612Z "digest": "sha256:8b81352a9241e7e164f3914cdaddd7621691d5819a09d7bdd73d33dd6efb95b0" 2025-09-07T06:40:20.0775778Z }, 2025-09-07T06:40:20.0775844Z { 2025-09-07T06:40:20.0775958Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0776099Z "size": 612, 2025-09-07T06:40:20.0776244Z "digest": "sha256:5acab4245292ebd11967968c31dcfd205a062ba8d51cd5434d1769be24bac138" 2025-09-07T06:40:20.0776404Z }, 2025-09-07T06:40:20.0776471Z { 2025-09-07T06:40:20.0776675Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0776824Z "size": 677677428, 2025-09-07T06:40:20.0776976Z "digest": "sha256:93616f9ff93b1a86c403d552a522bd7f5e94087dfae8591a702cb8ff6093fae6" 2025-09-07T06:40:20.0778117Z }, 2025-09-07T06:40:20.0778189Z { 2025-09-07T06:40:20.0778303Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0778445Z "size": 111, 2025-09-07T06:40:20.0778595Z "digest": "sha256:89ff93a7db63b67e6ddd0b6e69eb9ffeab124cbf549858da43069963395d404d" 2025-09-07T06:40:20.0778758Z }, 2025-09-07T06:40:20.0778824Z { 2025-09-07T06:40:20.0778937Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0779080Z "size": 1556, 2025-09-07T06:40:20.0779226Z "digest": "sha256:adfe896bf742a7a6b30805ef505693d995588c103230a34431783a81ca85c077" 2025-09-07T06:40:20.0779389Z }, 2025-09-07T06:40:20.0780387Z { 2025-09-07T06:40:20.0780504Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0780645Z "size": 107, 2025-09-07T06:40:20.0780850Z "digest": "sha256:4fa7b9266ac51619223af08624f19dce4b7d1dc0a61bb8c2e5988b893c6d70a1" 2025-09-07T06:40:20.0781011Z }, 2025-09-07T06:40:20.0781077Z { 2025-09-07T06:40:20.0781190Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0781331Z "size": 166, 2025-09-07T06:40:20.0781474Z "digest": "sha256:96da3351f8428b796952af029c90aee281f004276ea90b18e218fff582bf409a" 2025-09-07T06:40:20.0781632Z }, 2025-09-07T06:40:20.0781700Z { 2025-09-07T06:40:20.0782733Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0782881Z "size": 2935895, 2025-09-07T06:40:20.0783027Z "digest": "sha256:6004d474c5463aa044b392956e9e88325038136a49c9169fba705e2a72148f35" 2025-09-07T06:40:20.0783181Z }, 2025-09-07T06:40:20.0783247Z { 2025-09-07T06:40:20.0783423Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0783564Z "size": 107, 2025-09-07T06:40:20.0783711Z "digest": "sha256:d52b1118c4f366c0f69f64208f225eba56749f199802befd21b36e4054819601" 2025-09-07T06:40:20.0783870Z }, 2025-09-07T06:40:20.0783937Z { 2025-09-07T06:40:20.0784050Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0785116Z "size": 828, 2025-09-07T06:40:20.0785264Z "digest": "sha256:90710e19b4303a37da4fde82eed1dc78966f7904419f77bee203caec68b38dc6" 2025-09-07T06:40:20.0785424Z }, 2025-09-07T06:40:20.0785490Z { 2025-09-07T06:40:20.0785604Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0785749Z "size": 25811776, 2025-09-07T06:40:20.0785902Z "digest": "sha256:e8f510972d1a9d2d7d58d5047bd16e0761df5ae89d0900cb3737a37ece65ba9c" 2025-09-07T06:40:20.0786061Z }, 2025-09-07T06:40:20.0786128Z { 2025-09-07T06:40:20.0786241Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0786386Z "size": 104, 2025-09-07T06:40:20.0787534Z "digest": "sha256:705289c9d65eb4dc74002683aeb6bf44a1fbb7a595d359b70132eced0a396222" 2025-09-07T06:40:20.0787700Z }, 2025-09-07T06:40:20.0787766Z { 2025-09-07T06:40:20.0787877Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0788017Z "size": 425, 2025-09-07T06:40:20.0788161Z "digest": "sha256:aa3590d8b0d480b1152679e07759b95d93710df177747ef8685d3c64dd968a80" 2025-09-07T06:40:20.0788318Z }, 2025-09-07T06:40:20.0788385Z { 2025-09-07T06:40:20.0788498Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0788640Z "size": 19279569, 2025-09-07T06:40:20.0788793Z "digest": "sha256:b790e385f849ac4fe803ffe765d93d9e52dfaf532dc3d7e77560d9361d25e4c4" 2025-09-07T06:40:20.0789883Z }, 2025-09-07T06:40:20.0789954Z { 2025-09-07T06:40:20.0790065Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0790209Z "size": 639, 2025-09-07T06:40:20.0790353Z "digest": "sha256:7f422bd7611be8b204f81a16d84217dc35125fb6ae4ab3912f7ac883afb9d143" 2025-09-07T06:40:20.0790515Z }, 2025-09-07T06:40:20.0790582Z { 2025-09-07T06:40:20.0790695Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0790838Z "size": 724, 2025-09-07T06:40:20.0790985Z "digest": "sha256:11edb6ea0bca3be307ef836b0bd07999ff562bcb7a807f5e6c9f7d4d5f976b5d" 2025-09-07T06:40:20.0791149Z }, 2025-09-07T06:40:20.0792133Z { 2025-09-07T06:40:20.0792249Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0792389Z "size": 148, 2025-09-07T06:40:20.0792533Z "digest": "sha256:8e1167399aca20f7e0ac2500fb735ae2974a9f062a74e024b8f1d5f8b4faf6bd" 2025-09-07T06:40:20.0792693Z }, 2025-09-07T06:40:20.0792760Z { 2025-09-07T06:40:20.0792872Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0793013Z "size": 136, 2025-09-07T06:40:20.0793160Z "digest": "sha256:d157c63b23dc8bf7981fbe850c79a1eb2960e26ebc0344757c7f98665b7d686d" 2025-09-07T06:40:20.0793320Z }, 2025-09-07T06:40:20.0793455Z { 2025-09-07T06:40:20.0793568Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0794691Z "size": 140, 2025-09-07T06:40:20.0794835Z "digest": "sha256:ec0c7fb1e2871ad72728e420724306dd5bd0c969056666bdb0cf8c299197ac60" 2025-09-07T06:40:20.0794996Z }, 2025-09-07T06:40:20.0795062Z { 2025-09-07T06:40:20.0795174Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0795314Z "size": 32, 2025-09-07T06:40:20.0795461Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-09-07T06:40:20.0795622Z }, 2025-09-07T06:40:20.0795689Z { 2025-09-07T06:40:20.0795799Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0795939Z "size": 223, 2025-09-07T06:40:20.0797127Z "digest": "sha256:1e33591be1b2caebfdb79544377eb56d04ffc3f9859ae7bff1d0d319078a9440" 2025-09-07T06:40:20.0797365Z }, 2025-09-07T06:40:20.0797434Z { 2025-09-07T06:40:20.0797551Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0797692Z "size": 347, 2025-09-07T06:40:20.0797919Z "digest": "sha256:7988968c85d36793ccaf1bf4382d8df2dced38c1c0eddab511d0d4770f1d4b0a" 2025-09-07T06:40:20.0798081Z }, 2025-09-07T06:40:20.0798148Z { 2025-09-07T06:40:20.0798260Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0798403Z "size": 88301, 2025-09-07T06:40:20.0798555Z "digest": "sha256:91a6ca8580eebcfcf0791ff40d3862a199ec745c737eee389bcd2988e2d8ade2" 2025-09-07T06:40:20.0799661Z }, 2025-09-07T06:40:20.0799734Z { 2025-09-07T06:40:20.0799848Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0799989Z "size": 106, 2025-09-07T06:40:20.0800136Z "digest": "sha256:b14b67d48e1dbd915a6da5b1d21a7b6eda25f453044993909b2f4843ab3fe279" 2025-09-07T06:40:20.0800303Z }, 2025-09-07T06:40:20.0800370Z { 2025-09-07T06:40:20.0800485Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0800627Z "size": 1665, 2025-09-07T06:40:20.0800773Z "digest": "sha256:b192db73bfe1da14e42a568d19255310ed73477fcc8608b344f4b5e6baf6d8ac" 2025-09-07T06:40:20.0800933Z }, 2025-09-07T06:40:20.0801896Z { 2025-09-07T06:40:20.0802012Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0802153Z "size": 724, 2025-09-07T06:40:20.0802302Z "digest": "sha256:11edb6ea0bca3be307ef836b0bd07999ff562bcb7a807f5e6c9f7d4d5f976b5d" 2025-09-07T06:40:20.0802467Z }, 2025-09-07T06:40:20.0802534Z { 2025-09-07T06:40:20.0802645Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0802787Z "size": 138, 2025-09-07T06:40:20.0802932Z "digest": "sha256:2150bf3310869a53ad8f3c61553cf421e8166d06bac5badf027d5a8c8c27293d" 2025-09-07T06:40:20.0803098Z }, 2025-09-07T06:40:20.0803165Z { 2025-09-07T06:40:20.0804163Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0804309Z "size": 120, 2025-09-07T06:40:20.0804454Z "digest": "sha256:733c737688573b91b5f7356eb0e1841ec926e30ea6dc7b6c947d3fe38037f4ba" 2025-09-07T06:40:20.0804612Z }, 2025-09-07T06:40:20.0804679Z { 2025-09-07T06:40:20.0804790Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0804933Z "size": 5387023681, 2025-09-07T06:40:20.0805086Z "digest": "sha256:b61655c9140ac745bb4ede98af919309736dfcf4205c1a8d8e741481940d93fd" 2025-09-07T06:40:20.0805247Z }, 2025-09-07T06:40:20.0805315Z { 2025-09-07T06:40:20.0805427Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0806450Z "size": 174, 2025-09-07T06:40:20.0807002Z "digest": "sha256:e86fed99150a751abc2fc621813b3c422f30ef2b69140ed8ecf6dac1d67a9e51" 2025-09-07T06:40:20.0807167Z }, 2025-09-07T06:40:20.0807237Z { 2025-09-07T06:40:20.0807351Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0807491Z "size": 1897, 2025-09-07T06:40:20.0807714Z "digest": "sha256:81d56f8c0e9e96289a7250e45d9cc29663b554c56251c1fee391de9bebf0c201" 2025-09-07T06:40:20.0807875Z }, 2025-09-07T06:40:20.0807942Z { 2025-09-07T06:40:20.0808056Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0808201Z "size": 162692045, 2025-09-07T06:40:20.0809349Z "digest": "sha256:c3379e965cca765342286c4f615b0ee640287447f7077d915d485ee09f5e0567" 2025-09-07T06:40:20.0809512Z }, 2025-09-07T06:40:20.0809579Z { 2025-09-07T06:40:20.0809691Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0809831Z "size": 302, 2025-09-07T06:40:20.0809974Z "digest": "sha256:ca8d286566c546411a1541f9e11d22786fae23a1d8004f8e69e283ce89ad2916" 2025-09-07T06:40:20.0810131Z }, 2025-09-07T06:40:20.0810198Z { 2025-09-07T06:40:20.0810356Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0810497Z "size": 32, 2025-09-07T06:40:20.0810646Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-09-07T06:40:20.0811736Z }, 2025-09-07T06:40:20.0811807Z { 2025-09-07T06:40:20.0811919Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0812060Z "size": 108, 2025-09-07T06:40:20.0812203Z "digest": "sha256:ac22ab96541095de08a583bdf8af502c383a8dd4f25afa1712460ffc31c81514" 2025-09-07T06:40:20.0812365Z }, 2025-09-07T06:40:20.0812431Z { 2025-09-07T06:40:20.0812544Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-09-07T06:40:20.0812686Z "size": 54145699, 2025-09-07T06:40:20.0812838Z "digest": "sha256:6fd45892b6e79a95321086f1d7842aeaf6fd2adaa82c3c8439efef9a5d79fb8d" 2025-09-07T06:40:20.0813000Z } 2025-09-07T06:40:20.0813960Z ] 2025-09-07T06:40:20.0814034Z } 2025-09-07T06:40:20.0814114Z + exit 0 2025-09-07T06:40:20.0840417Z ##[group]Run set -eux 2025-09-07T06:40:20.0840581Z set -eux 2025-09-07T06:40:20.0840738Z # It's ok if this steps fails, it would then be an anonymous user like what we used to have 2025-09-07T06:40:20.0841140Z aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token | jq --raw-output '.SecretString' | jq -r .docker_hub_readonly_token | docker login --username pytorchbot --password-stdin || true 2025-09-07T06:40:20.0846889Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:20.0847029Z env: 2025-09-07T06:40:20.0847115Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:20.0849146Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:40:20.0849324Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:40:20.0849482Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:40:20.0849851Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:40:20.0850215Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:40:20.0850323Z AWS_REGION: us-east-1 2025-09-07T06:40:20.0850519Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:40:20.0850665Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:40:20.0852697Z AWS_SESSION_TOKEN: *** 2025-09-07T06:40:20.0852797Z ##[endgroup] 2025-09-07T06:40:20.0881718Z + aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token 2025-09-07T06:40:20.0882063Z + jq --raw-output .SecretString 2025-09-07T06:40:20.0882395Z /home/runner/_work/_temp/658abce4-c4f6-4173-972d-9437676abb08.sh: line 3: aws: command not found 2025-09-07T06:40:20.0882764Z + docker login --username pytorchbot --password-stdin 2025-09-07T06:40:20.0882995Z + jq -r .docker_hub_readonly_token 2025-09-07T06:40:20.0984974Z Error: Cannot perform an interactive login from a non TTY device 2025-09-07T06:40:20.0995430Z + true 2025-09-07T06:40:20.1073890Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main 2025-09-07T06:40:20.1074060Z with: 2025-09-07T06:40:20.1074326Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:40:20.1074648Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:40:20.1074798Z env: 2025-09-07T06:40:20.1074892Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:20.1075030Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:40:20.1075207Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:40:20.1075369Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:40:20.1075745Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:40:20.1076264Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:40:20.1076381Z AWS_REGION: us-east-1 2025-09-07T06:40:20.1076683Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:40:20.1076841Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:40:20.1078955Z AWS_SESSION_TOKEN: *** 2025-09-07T06:40:20.1079060Z ##[endgroup] 2025-09-07T06:40:20.1106653Z ##[group]Run set -x 2025-09-07T06:40:20.1106760Z set -x 2025-09-07T06:40:20.1106844Z set +e 2025-09-07T06:40:20.1106926Z  2025-09-07T06:40:20.1107006Z login() { 2025-09-07T06:40:20.1107187Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-09-07T06:40:20.1107376Z } 2025-09-07T06:40:20.1107459Z  2025-09-07T06:40:20.1110984Z retry () { 2025-09-07T06:40:20.1111096Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-09-07T06:40:20.1111215Z } 2025-09-07T06:40:20.1111295Z  2025-09-07T06:40:20.1111391Z retry login "${DOCKER_REGISTRY}" 2025-09-07T06:40:20.1111503Z  2025-09-07T06:40:20.1111680Z IMAGE_SIZE=$(docker manifest inspect "${DOCKER_IMAGE}" | jq '[.layers[].size, .config.size] | add / 1024 / 1024') 2025-09-07T06:40:20.1111919Z echo "Compressed size of image in MB: ${IMAGE_SIZE}" 2025-09-07T06:40:20.1112055Z  2025-09-07T06:40:20.1112133Z set -e 2025-09-07T06:40:20.1112261Z # ignore output since only exit code is used for conditional 2025-09-07T06:40:20.1112438Z # only pull docker image if it's not available locally 2025-09-07T06:40:20.1112634Z if ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then 2025-09-07T06:40:20.1112813Z  retry docker pull "${DOCKER_IMAGE}" 2025-09-07T06:40:20.1112929Z fi 2025-09-07T06:40:20.1118476Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:40:20.1118610Z env: 2025-09-07T06:40:20.1118696Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:40:20.1118828Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:40:20.1118997Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:40:20.1119154Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:40:20.1119522Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:40:20.1119878Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:40:20.1119986Z AWS_REGION: us-east-1 2025-09-07T06:40:20.1120119Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:40:20.1120264Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:40:20.1122322Z AWS_SESSION_TOKEN: *** 2025-09-07T06:40:20.1122591Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:40:20.1123012Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:40:20.1123157Z ##[endgroup] 2025-09-07T06:40:20.1145008Z + set +e 2025-09-07T06:40:20.1145169Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:40:20.1145343Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:40:20.1147562Z + aws ecr get-login-password --region us-east-1 2025-09-07T06:40:20.1147959Z /home/runner/_work/_temp/4c844e31-c05f-4f74-b3b7-167e66c589a8.sh: line 5: aws: command not found 2025-09-07T06:40:20.1148236Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:40:20.1241395Z Error: Cannot perform an interactive login from a non TTY device 2025-09-07T06:40:20.1253029Z + sleep 1 2025-09-07T06:40:21.1266021Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:40:21.1269009Z + aws ecr get-login-password --region us-east-1 2025-09-07T06:40:21.1269922Z /home/runner/_work/_temp/4c844e31-c05f-4f74-b3b7-167e66c589a8.sh: line 5: aws: command not found 2025-09-07T06:40:21.1271315Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:40:21.1366805Z Error: Cannot perform an interactive login from a non TTY device 2025-09-07T06:40:21.1379267Z + sleep 2 2025-09-07T06:40:23.1396829Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:40:23.1399587Z + aws ecr get-login-password --region us-east-1 2025-09-07T06:40:23.1400947Z /home/runner/_work/_temp/4c844e31-c05f-4f74-b3b7-167e66c589a8.sh: line 5: aws: command not found 2025-09-07T06:40:23.1401778Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T06:40:23.1508787Z Error: Cannot perform an interactive login from a non TTY device 2025-09-07T06:40:23.1525301Z ++ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:40:23.1533114Z ++ jq '[.layers[].size, .config.size] | add / 1024 / 1024' 2025-09-07T06:40:23.7212325Z + IMAGE_SIZE=18063.334635734558 2025-09-07T06:40:23.7212620Z + echo 'Compressed size of image in MB: 18063.334635734558' 2025-09-07T06:40:23.7212865Z Compressed size of image in MB: 18063.334635734558 2025-09-07T06:40:23.7213071Z + set -e 2025-09-07T06:40:23.7213493Z + docker inspect --type=image 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:40:23.7349288Z + retry docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:40:23.7350551Z + docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:40:24.2047904Z pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77: Pulling from pytorch/ci-image 2025-09-07T06:40:24.2048368Z de66fc90c55d: Pulling fs layer 2025-09-07T06:40:24.2048595Z efc45b9044a6: Pulling fs layer 2025-09-07T06:40:24.2048772Z 06ce422a41d1: Pulling fs layer 2025-09-07T06:40:24.2048940Z 673cf5ffa968: Pulling fs layer 2025-09-07T06:40:24.2049114Z 3042b077c06a: Pulling fs layer 2025-09-07T06:40:24.2049280Z ed25a020f194: Pulling fs layer 2025-09-07T06:40:24.2049445Z a5876169851f: Pulling fs layer 2025-09-07T06:40:24.2049614Z 4971bfcf31c1: Pulling fs layer 2025-09-07T06:40:24.2049788Z 4d141034e9db: Pulling fs layer 2025-09-07T06:40:24.2049951Z 11edb6ea0bca: Pulling fs layer 2025-09-07T06:40:24.2050120Z dc4852f72739: Pulling fs layer 2025-09-07T06:40:24.2050295Z 829c85269cfc: Pulling fs layer 2025-09-07T06:40:24.2050458Z 77ab4e659dd8: Pulling fs layer 2025-09-07T06:40:24.2050626Z c0da146487b6: Pulling fs layer 2025-09-07T06:40:24.2050792Z 673cf5ffa968: Waiting 2025-09-07T06:40:24.2050970Z a61c8111f466: Pulling fs layer 2025-09-07T06:40:24.2051139Z 720fb67e397f: Pulling fs layer 2025-09-07T06:40:24.2051308Z 3042b077c06a: Waiting 2025-09-07T06:40:24.2051867Z fa7524284edd: Pulling fs layer 2025-09-07T06:40:24.2052010Z 57cfc9fee363: Pulling fs layer 2025-09-07T06:40:24.2052149Z c9e2c9bad36d: Pulling fs layer 2025-09-07T06:40:24.2052289Z 8257abce8e9b: Pulling fs layer 2025-09-07T06:40:24.2052427Z 8b81352a9241: Pulling fs layer 2025-09-07T06:40:24.2052562Z ed25a020f194: Waiting 2025-09-07T06:40:24.2052689Z 5acab4245292: Pulling fs layer 2025-09-07T06:40:24.2052829Z 93616f9ff93b: Pulling fs layer 2025-09-07T06:40:24.2052968Z 89ff93a7db63: Pulling fs layer 2025-09-07T06:40:24.2053105Z adfe896bf742: Pulling fs layer 2025-09-07T06:40:24.2053243Z 4fa7b9266ac5: Pulling fs layer 2025-09-07T06:40:24.2053376Z a5876169851f: Waiting 2025-09-07T06:40:24.2053500Z 96da3351f842: Pulling fs layer 2025-09-07T06:40:24.2053632Z 4971bfcf31c1: Waiting 2025-09-07T06:40:24.2053753Z 4d141034e9db: Waiting 2025-09-07T06:40:24.2053874Z 11edb6ea0bca: Waiting 2025-09-07T06:40:24.2053999Z 6004d474c546: Pulling fs layer 2025-09-07T06:40:24.2054138Z d52b1118c4f3: Pulling fs layer 2025-09-07T06:40:24.2054409Z 90710e19b430: Pulling fs layer 2025-09-07T06:40:24.2054543Z 829c85269cfc: Waiting 2025-09-07T06:40:24.2054668Z e8f510972d1a: Pulling fs layer 2025-09-07T06:40:24.2054806Z 705289c9d65e: Pulling fs layer 2025-09-07T06:40:24.2054943Z aa3590d8b0d4: Pulling fs layer 2025-09-07T06:40:24.2055082Z b790e385f849: Pulling fs layer 2025-09-07T06:40:24.2055221Z 7f422bd7611b: Pulling fs layer 2025-09-07T06:40:24.2055357Z 8e1167399aca: Pulling fs layer 2025-09-07T06:40:24.2071532Z d157c63b23dc: Pulling fs layer 2025-09-07T06:40:24.2071673Z ec0c7fb1e287: Pulling fs layer 2025-09-07T06:40:24.2071803Z 4f4fb700ef54: Pulling fs layer 2025-09-07T06:40:24.2073017Z 1e33591be1b2: Pulling fs layer 2025-09-07T06:40:24.2073396Z 7988968c85d3: Pulling fs layer 2025-09-07T06:40:24.2073645Z 91a6ca8580ee: Pulling fs layer 2025-09-07T06:40:24.2073988Z b14b67d48e1d: Pulling fs layer 2025-09-07T06:40:24.2074221Z b192db73bfe1: Pulling fs layer 2025-09-07T06:40:24.2074457Z 2150bf331086: Pulling fs layer 2025-09-07T06:40:24.2074765Z 733c73768857: Pulling fs layer 2025-09-07T06:40:24.2074991Z b61655c9140a: Pulling fs layer 2025-09-07T06:40:24.2075214Z e86fed99150a: Pulling fs layer 2025-09-07T06:40:24.2075504Z 81d56f8c0e9e: Pulling fs layer 2025-09-07T06:40:24.2075760Z c3379e965cca: Pulling fs layer 2025-09-07T06:40:24.2075980Z ca8d286566c5: Pulling fs layer 2025-09-07T06:40:24.2076201Z ac22ab965410: Pulling fs layer 2025-09-07T06:40:24.2076429Z 6fd45892b6e7: Pulling fs layer 2025-09-07T06:40:24.2076794Z dc4852f72739: Waiting 2025-09-07T06:40:24.2076991Z 8b81352a9241: Waiting 2025-09-07T06:40:24.2077186Z 89ff93a7db63: Waiting 2025-09-07T06:40:24.2077377Z a61c8111f466: Waiting 2025-09-07T06:40:24.2077564Z 96da3351f842: Waiting 2025-09-07T06:40:24.2077863Z 4fa7b9266ac5: Waiting 2025-09-07T06:40:24.2078051Z 90710e19b430: Waiting 2025-09-07T06:40:24.2078241Z d52b1118c4f3: Waiting 2025-09-07T06:40:24.2078429Z 77ab4e659dd8: Waiting 2025-09-07T06:40:24.2087274Z e8f510972d1a: Waiting 2025-09-07T06:40:24.2087432Z 7f422bd7611b: Waiting 2025-09-07T06:40:24.2087586Z b790e385f849: Waiting 2025-09-07T06:40:24.2087731Z 8e1167399aca: Waiting 2025-09-07T06:40:24.2087871Z 720fb67e397f: Waiting 2025-09-07T06:40:24.2088013Z 705289c9d65e: Waiting 2025-09-07T06:40:24.2088154Z 7988968c85d3: Waiting 2025-09-07T06:40:24.2088302Z 1e33591be1b2: Waiting 2025-09-07T06:40:24.2088449Z c0da146487b6: Waiting 2025-09-07T06:40:24.2088594Z d157c63b23dc: Waiting 2025-09-07T06:40:24.2088733Z ec0c7fb1e287: Waiting 2025-09-07T06:40:24.2091023Z 91a6ca8580ee: Waiting 2025-09-07T06:40:24.2091170Z 4f4fb700ef54: Waiting 2025-09-07T06:40:24.2091298Z 733c73768857: Waiting 2025-09-07T06:40:24.2091425Z b61655c9140a: Waiting 2025-09-07T06:40:24.2091551Z c3379e965cca: Waiting 2025-09-07T06:40:24.2091682Z 93616f9ff93b: Waiting 2025-09-07T06:40:24.2091809Z adfe896bf742: Waiting 2025-09-07T06:40:24.2091940Z ca8d286566c5: Waiting 2025-09-07T06:40:24.2092070Z 5acab4245292: Waiting 2025-09-07T06:40:24.2092196Z 81d56f8c0e9e: Waiting 2025-09-07T06:40:24.2092328Z b192db73bfe1: Waiting 2025-09-07T06:40:24.2094489Z ac22ab965410: Waiting 2025-09-07T06:40:24.2094655Z e86fed99150a: Waiting 2025-09-07T06:40:24.2094755Z 8257abce8e9b: Waiting 2025-09-07T06:40:24.2094855Z 6fd45892b6e7: Waiting 2025-09-07T06:40:24.2094955Z c9e2c9bad36d: Waiting 2025-09-07T06:40:24.2095055Z 57cfc9fee363: Waiting 2025-09-07T06:40:24.4504995Z efc45b9044a6: Verifying Checksum 2025-09-07T06:40:24.4505358Z efc45b9044a6: Download complete 2025-09-07T06:40:24.6668378Z 673cf5ffa968: Download complete 2025-09-07T06:40:24.7814651Z de66fc90c55d: Verifying Checksum 2025-09-07T06:40:24.7815020Z de66fc90c55d: Download complete 2025-09-07T06:40:24.8974775Z 3042b077c06a: Download complete 2025-09-07T06:40:25.0188543Z ed25a020f194: Verifying Checksum 2025-09-07T06:40:25.0188788Z ed25a020f194: Download complete 2025-09-07T06:40:25.2361082Z 4971bfcf31c1: Verifying Checksum 2025-09-07T06:40:25.2361501Z 4971bfcf31c1: Download complete 2025-09-07T06:40:25.3411280Z de66fc90c55d: Pull complete 2025-09-07T06:40:25.3521108Z efc45b9044a6: Pull complete 2025-09-07T06:40:25.4525734Z 4d141034e9db: Verifying Checksum 2025-09-07T06:40:25.4525965Z 4d141034e9db: Download complete 2025-09-07T06:40:25.6858843Z 11edb6ea0bca: Verifying Checksum 2025-09-07T06:40:25.6859318Z 11edb6ea0bca: Download complete 2025-09-07T06:40:26.3006290Z a5876169851f: Verifying Checksum 2025-09-07T06:40:26.3006811Z a5876169851f: Download complete 2025-09-07T06:40:26.4999875Z 829c85269cfc: Download complete 2025-09-07T06:40:27.3306611Z 77ab4e659dd8: Verifying Checksum 2025-09-07T06:40:27.3312004Z 77ab4e659dd8: Download complete 2025-09-07T06:40:27.5581604Z c0da146487b6: Download complete 2025-09-07T06:40:27.7679404Z a61c8111f466: Verifying Checksum 2025-09-07T06:40:27.7679633Z a61c8111f466: Download complete 2025-09-07T06:40:27.7752816Z 06ce422a41d1: Download complete 2025-09-07T06:40:27.9740065Z 720fb67e397f: Download complete 2025-09-07T06:40:28.1831226Z 57cfc9fee363: Verifying Checksum 2025-09-07T06:40:28.1831677Z 57cfc9fee363: Download complete 2025-09-07T06:40:28.4242958Z c9e2c9bad36d: Download complete 2025-09-07T06:40:32.9099097Z 06ce422a41d1: Pull complete 2025-09-07T06:40:32.9185663Z 673cf5ffa968: Pull complete 2025-09-07T06:40:32.9294900Z 3042b077c06a: Pull complete 2025-09-07T06:40:32.9381260Z ed25a020f194: Pull complete 2025-09-07T06:40:33.4183171Z fa7524284edd: Verifying Checksum 2025-09-07T06:40:33.4183617Z fa7524284edd: Download complete 2025-09-07T06:40:33.6495303Z 8b81352a9241: Download complete 2025-09-07T06:40:33.8675550Z 5acab4245292: Download complete 2025-09-07T06:40:34.2636147Z a5876169851f: Pull complete 2025-09-07T06:40:34.2733187Z 4971bfcf31c1: Pull complete 2025-09-07T06:40:34.2858691Z 4d141034e9db: Pull complete 2025-09-07T06:40:34.2948002Z 11edb6ea0bca: Pull complete 2025-09-07T06:40:41.8088442Z 93616f9ff93b: Verifying Checksum 2025-09-07T06:40:41.8096855Z 93616f9ff93b: Download complete 2025-09-07T06:40:42.0566195Z 89ff93a7db63: Download complete 2025-09-07T06:40:42.2726292Z adfe896bf742: Verifying Checksum 2025-09-07T06:40:42.2726764Z adfe896bf742: Download complete 2025-09-07T06:40:42.4820163Z 4fa7b9266ac5: Verifying Checksum 2025-09-07T06:40:42.6692142Z 96da3351f842: Verifying Checksum 2025-09-07T06:40:42.6692500Z 96da3351f842: Download complete 2025-09-07T06:40:43.0656231Z 6004d474c546: Verifying Checksum 2025-09-07T06:40:43.0656726Z 6004d474c546: Download complete 2025-09-07T06:40:43.2864074Z d52b1118c4f3: Download complete 2025-09-07T06:40:43.4884869Z 90710e19b430: Verifying Checksum 2025-09-07T06:40:43.4885406Z 90710e19b430: Download complete 2025-09-07T06:40:44.4510147Z e8f510972d1a: Verifying Checksum 2025-09-07T06:40:44.4510433Z e8f510972d1a: Download complete 2025-09-07T06:40:44.6578519Z 705289c9d65e: Download complete 2025-09-07T06:40:44.8921047Z aa3590d8b0d4: Verifying Checksum 2025-09-07T06:40:44.8921514Z aa3590d8b0d4: Download complete 2025-09-07T06:40:45.3836053Z b790e385f849: Verifying Checksum 2025-09-07T06:40:45.3836271Z b790e385f849: Download complete 2025-09-07T06:40:45.8433673Z 8e1167399aca: Verifying Checksum 2025-09-07T06:40:45.8434195Z 8e1167399aca: Download complete 2025-09-07T06:40:46.0695972Z d157c63b23dc: Download complete 2025-09-07T06:40:46.3836285Z 4f4fb700ef54: Verifying Checksum 2025-09-07T06:40:46.3836640Z 4f4fb700ef54: Download complete 2025-09-07T06:40:46.5824060Z 1e33591be1b2: Verifying Checksum 2025-09-07T06:40:46.5829402Z 1e33591be1b2: Download complete 2025-09-07T06:40:46.8037819Z 7988968c85d3: Verifying Checksum 2025-09-07T06:40:46.8037987Z 7988968c85d3: Download complete 2025-09-07T06:40:47.0570951Z 91a6ca8580ee: Verifying Checksum 2025-09-07T06:40:47.0571127Z 91a6ca8580ee: Download complete 2025-09-07T06:40:47.2731606Z b14b67d48e1d: Verifying Checksum 2025-09-07T06:40:47.2731916Z b14b67d48e1d: Download complete 2025-09-07T06:40:47.4796646Z b192db73bfe1: Download complete 2025-09-07T06:40:47.7401326Z 2150bf331086: Download complete 2025-09-07T06:40:47.9706594Z 733c73768857: Verifying Checksum 2025-09-07T06:40:47.9706756Z 733c73768857: Download complete 2025-09-07T06:41:00.0261090Z dc4852f72739: Verifying Checksum 2025-09-07T06:41:00.0262120Z dc4852f72739: Download complete 2025-09-07T06:41:00.2535373Z e86fed99150a: Download complete 2025-09-07T06:41:00.4598770Z 81d56f8c0e9e: Verifying Checksum 2025-09-07T06:41:00.4608681Z 81d56f8c0e9e: Download complete 2025-09-07T06:41:02.2672993Z c3379e965cca: Verifying Checksum 2025-09-07T06:41:02.2673333Z c3379e965cca: Download complete 2025-09-07T06:41:02.4548337Z ca8d286566c5: Download complete 2025-09-07T06:41:02.6615383Z ac22ab965410: Verifying Checksum 2025-09-07T06:41:02.6615776Z ac22ab965410: Download complete 2025-09-07T06:41:03.3911938Z 6fd45892b6e7: Verifying Checksum 2025-09-07T06:41:03.3912343Z 6fd45892b6e7: Download complete 2025-09-07T06:41:26.8518642Z dc4852f72739: Pull complete 2025-09-07T06:41:27.5064826Z 829c85269cfc: Pull complete 2025-09-07T06:41:28.6967911Z 77ab4e659dd8: Pull complete 2025-09-07T06:41:28.9956579Z c0da146487b6: Pull complete 2025-09-07T06:41:29.1608068Z a61c8111f466: Pull complete 2025-09-07T06:41:29.1864837Z 720fb67e397f: Pull complete 2025-09-07T06:41:33.0168249Z fa7524284edd: Pull complete 2025-09-07T06:41:33.0301390Z 57cfc9fee363: Pull complete 2025-09-07T06:41:33.0406839Z c9e2c9bad36d: Pull complete 2025-09-07T06:41:42.0315261Z b61655c9140a: Verifying Checksum 2025-09-07T06:41:42.0315712Z b61655c9140a: Download complete 2025-09-07T06:41:52.9296281Z 8257abce8e9b: Verifying Checksum 2025-09-07T06:41:52.9296912Z 8257abce8e9b: Download complete 2025-09-07T06:42:47.8925524Z 8257abce8e9b: Pull complete 2025-09-07T06:42:48.7552816Z 8b81352a9241: Pull complete 2025-09-07T06:42:49.5163635Z 5acab4245292: Pull complete 2025-09-07T06:42:54.8199226Z 93616f9ff93b: Pull complete 2025-09-07T06:42:54.8314630Z 89ff93a7db63: Pull complete 2025-09-07T06:42:54.8418089Z adfe896bf742: Pull complete 2025-09-07T06:42:54.8512385Z 4fa7b9266ac5: Pull complete 2025-09-07T06:42:54.8601493Z 96da3351f842: Pull complete 2025-09-07T06:42:54.8943384Z 6004d474c546: Pull complete 2025-09-07T06:42:54.9029801Z d52b1118c4f3: Pull complete 2025-09-07T06:42:54.9135428Z 90710e19b430: Pull complete 2025-09-07T06:42:55.1452594Z e8f510972d1a: Pull complete 2025-09-07T06:42:55.1547656Z 705289c9d65e: Pull complete 2025-09-07T06:42:55.1638415Z aa3590d8b0d4: Pull complete 2025-09-07T06:42:55.2752446Z b790e385f849: Pull complete 2025-09-07T06:42:55.2843878Z 7f422bd7611b: Pull complete 2025-09-07T06:42:55.3028625Z 8e1167399aca: Pull complete 2025-09-07T06:42:55.3150843Z d157c63b23dc: Pull complete 2025-09-07T06:42:55.3249530Z ec0c7fb1e287: Pull complete 2025-09-07T06:42:55.3332453Z 4f4fb700ef54: Pull complete 2025-09-07T06:42:55.3423343Z 1e33591be1b2: Pull complete 2025-09-07T06:42:55.3518324Z 7988968c85d3: Pull complete 2025-09-07T06:42:55.3645434Z 91a6ca8580ee: Pull complete 2025-09-07T06:42:55.3736355Z b14b67d48e1d: Pull complete 2025-09-07T06:42:55.3826565Z b192db73bfe1: Pull complete 2025-09-07T06:42:55.4006323Z 2150bf331086: Pull complete 2025-09-07T06:42:55.4117010Z 733c73768857: Pull complete 2025-09-07T06:43:31.7861309Z b61655c9140a: Pull complete 2025-09-07T06:43:31.7959764Z e86fed99150a: Pull complete 2025-09-07T06:43:31.8054331Z 81d56f8c0e9e: Pull complete 2025-09-07T06:43:34.6114775Z c3379e965cca: Pull complete 2025-09-07T06:43:34.6204557Z ca8d286566c5: Pull complete 2025-09-07T06:43:34.6369663Z ac22ab965410: Pull complete 2025-09-07T06:43:35.7946910Z 6fd45892b6e7: Pull complete 2025-09-07T06:43:36.1927848Z Digest: sha256:9e860fd68cd38c78fd118edc6e8b6bbf754ab60affb0c456f2b2696bd7ea79a5 2025-09-07T06:43:36.3469714Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:43:36.3817374Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:43:36.3875978Z Prepare all required actions 2025-09-07T06:43:36.3899906Z ##[group]Run ./.github/actions/get-workflow-job-id 2025-09-07T06:43:36.3900040Z with: 2025-09-07T06:43:36.3900375Z github-token: *** 2025-09-07T06:43:36.3900463Z env: 2025-09-07T06:43:36.3900710Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:43:36.3900845Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:43:36.3901024Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:43:36.3901194Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:43:36.3901582Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:43:36.3901960Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:43:36.3902078Z AWS_REGION: us-east-1 2025-09-07T06:43:36.3902216Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:43:36.3904715Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:43:36.3906959Z AWS_SESSION_TOKEN: *** 2025-09-07T06:43:36.3907059Z ##[endgroup] 2025-09-07T06:43:36.3918061Z ##[group]Run set -eux 2025-09-07T06:43:36.3918172Z set -eux 2025-09-07T06:43:36.3918349Z python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}" 2025-09-07T06:43:36.3924218Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:43:36.3924367Z env: 2025-09-07T06:43:36.3924462Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:43:36.3924602Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:43:36.3927381Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:43:36.3927555Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:43:36.3927932Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:43:36.3928291Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:43:36.3928400Z AWS_REGION: us-east-1 2025-09-07T06:43:36.3928530Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:43:36.3928742Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:43:36.3930849Z AWS_SESSION_TOKEN: *** 2025-09-07T06:43:36.3930996Z GITHUB_TOKEN: *** 2025-09-07T06:43:36.3931086Z ##[endgroup] 2025-09-07T06:43:36.3961360Z + python3 .github/scripts/get_workflow_job_id.py 17524754565 linux.rocm.gpu.gfx942.1-xb8kr-runner-hql9s 2025-09-07T06:43:36.8476009Z Setting output job-id=49774353529 2025-09-07T06:43:36.8476428Z Setting output job-name=linux-noble-rocm-py3.12-mi300 / test (default, 6, 6, linux.rocm.gpu.gfx942.1) 2025-09-07T06:43:36.8608408Z Prepare all required actions 2025-09-07T06:43:36.8608605Z Getting action download info 2025-09-07T06:43:37.0745364Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6) 2025-09-07T06:43:37.4571473Z Download action repository 'actions/download-artifact@v4' (SHA:d3f86a106a0bac45b974a628896c90dbdf5c8093) 2025-09-07T06:43:38.5569140Z ##[group]Run ./.github/actions/download-build-artifacts 2025-09-07T06:43:38.5569291Z with: 2025-09-07T06:43:38.5569387Z name: linux-noble-rocm-py3.12-mi300 2025-09-07T06:43:38.5569524Z s3-bucket: gha-artifacts 2025-09-07T06:43:38.5569621Z env: 2025-09-07T06:43:38.5569703Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:43:38.5569832Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:43:38.5570001Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:43:38.5570154Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:43:38.5570555Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:43:38.5570921Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:43:38.5571025Z AWS_REGION: us-east-1 2025-09-07T06:43:38.5571201Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:43:38.5571345Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:43:38.5573452Z AWS_SESSION_TOKEN: *** 2025-09-07T06:43:38.5573548Z ##[endgroup] 2025-09-07T06:43:38.5591008Z ##[group]Run seemethere/download-artifact-s3@v4 2025-09-07T06:43:38.5591133Z with: 2025-09-07T06:43:38.5591229Z name: linux-noble-rocm-py3.12-mi300 2025-09-07T06:43:38.5591347Z s3-bucket: gha-artifacts 2025-09-07T06:43:38.5591447Z region: us-east-1 2025-09-07T06:43:38.5591532Z env: 2025-09-07T06:43:38.5591616Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:43:38.5591740Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:43:38.5591910Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:43:38.5592064Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:43:38.5592434Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:43:38.5592785Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:43:38.5592893Z AWS_REGION: us-east-1 2025-09-07T06:43:38.5593024Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:43:38.5593183Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:43:38.5595239Z AWS_SESSION_TOKEN: *** 2025-09-07T06:43:38.5595335Z ##[endgroup] 2025-09-07T06:43:38.7943140Z (node:4933) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-09-07T06:43:38.7943487Z 2025-09-07T06:43:38.7943639Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-09-07T06:43:38.7944052Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-09-07T06:43:38.7944423Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-09-07T06:43:38.9781204Z Found 1 objects with prefix pytorch/pytorch/17524754565/linux-noble-rocm-py3.12-mi300/ 2025-09-07T06:43:38.9781891Z Starting download (1/1): /home/runner/_work/pytorch/pytorch/artifacts.zip 2025-09-07T06:43:47.4654725Z Finished download (1/1): /home/runner/_work/pytorch/pytorch/artifacts.zip 2025-09-07T06:43:47.4656834Z Artifact download has finished successfully 2025-09-07T06:43:47.4942117Z ##[group]Run unzip -o artifacts.zip 2025-09-07T06:43:47.4942277Z unzip -o artifacts.zip 2025-09-07T06:43:47.4950698Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:43:47.4950842Z env: 2025-09-07T06:43:47.4950929Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:43:47.4951059Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:43:47.4951430Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:43:47.4951589Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:43:47.4951963Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:43:47.4952325Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:43:47.4952434Z AWS_REGION: us-east-1 2025-09-07T06:43:47.4952627Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:43:47.4952797Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:43:47.4954839Z AWS_SESSION_TOKEN: *** 2025-09-07T06:43:47.4954942Z ##[endgroup] 2025-09-07T06:43:47.4988119Z Archive: artifacts.zip 2025-09-07T06:43:47.4990200Z creating: dist/ 2025-09-07T06:43:49.0400187Z inflating: dist/torch-2.9.0a0+git93fb23d-cp312-cp312-linux_x86_64.whl 2025-09-07T06:43:49.0483763Z inflating: dist/.ninja_log 2025-09-07T06:43:49.0485180Z creating: build/custom_test_artifacts/ 2025-09-07T06:43:49.0494471Z creating: build/custom_test_artifacts/custom-op-build/ 2025-09-07T06:43:49.0494887Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/ 2025-09-07T06:43:49.0495307Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/pkgRedirects/ 2025-09-07T06:43:49.0495803Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeConfigureLog.yaml 2025-09-07T06:43:49.0496246Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/ 2025-09-07T06:43:49.0497176Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeSystem.cmake 2025-09-07T06:43:49.0497631Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdC/ 2025-09-07T06:43:49.0498087Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdC/tmp/ 2025-09-07T06:43:49.0498608Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdC/CMakeCCompilerId.c 2025-09-07T06:43:49.0499128Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdC/a.out 2025-09-07T06:43:49.0499611Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeCCompiler.cmake 2025-09-07T06:43:49.0500091Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCXX/ 2025-09-07T06:43:49.0500552Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCXX/tmp/ 2025-09-07T06:43:49.0501079Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-09-07T06:43:49.0501531Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CompilerIdCXX/a.out 2025-09-07T06:43:49.0501900Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeCXXCompiler.cmake 2025-09-07T06:43:49.0502303Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_C.bin 2025-09-07T06:43:49.0502723Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_CXX.bin 2025-09-07T06:43:49.0503083Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeScratch/ 2025-09-07T06:43:49.0503383Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/ 2025-09-07T06:43:49.0503692Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache 2025-09-07T06:43:49.0504023Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/ 2025-09-07T06:43:49.0504396Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts 2025-09-07T06:43:49.0504797Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make 2025-09-07T06:43:49.0505173Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make 2025-09-07T06:43:49.0505663Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt 2025-09-07T06:43:49.0506029Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake 2025-09-07T06:43:49.0506395Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make 2025-09-07T06:43:49.0509517Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake 2025-09-07T06:43:49.0509887Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make 2025-09-07T06:43:49.0510263Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make 2025-09-07T06:43:49.0511678Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d 2025-09-07T06:43:49.0633116Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o 2025-09-07T06:43:49.0633666Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.d 2025-09-07T06:43:49.0634103Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/ 2025-09-07T06:43:49.0634573Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts 2025-09-07T06:43:49.0635099Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make 2025-09-07T06:43:49.0635597Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make 2025-09-07T06:43:49.0642677Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt 2025-09-07T06:43:49.0643128Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake 2025-09-07T06:43:49.0643537Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make 2025-09-07T06:43:49.0643950Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake 2025-09-07T06:43:49.0644351Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make 2025-09-07T06:43:49.0644742Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make 2025-09-07T06:43:49.0648119Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d 2025-09-07T06:43:49.0696672Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o 2025-09-07T06:43:49.0697030Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.d 2025-09-07T06:43:49.0697356Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-09-07T06:43:49.0697674Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt 2025-09-07T06:43:49.0700361Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks 2025-09-07T06:43:49.0700630Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2 2025-09-07T06:43:49.0700888Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake 2025-09-07T06:43:49.0701160Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/InstallScripts.json 2025-09-07T06:43:49.0701435Z inflating: build/custom_test_artifacts/custom-op-build/hipblaslt_test_outer_vec.cc 2025-09-07T06:43:49.0701698Z inflating: build/custom_test_artifacts/custom-op-build/hipblaslt_test_vec_ext.cc 2025-09-07T06:43:49.0701936Z inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt 2025-09-07T06:43:49.0702157Z inflating: build/custom_test_artifacts/custom-op-build/Makefile 2025-09-07T06:43:49.0702377Z inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake 2025-09-07T06:43:49.0808833Z inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so 2025-09-07T06:43:49.0845259Z inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops 2025-09-07T06:43:49.0845491Z creating: build/custom_test_artifacts/jit-hook-build/ 2025-09-07T06:43:49.0845675Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/ 2025-09-07T06:43:49.0845886Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/pkgRedirects/ 2025-09-07T06:43:49.0846130Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeConfigureLog.yaml 2025-09-07T06:43:49.0846373Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/ 2025-09-07T06:43:49.0846728Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeSystem.cmake 2025-09-07T06:43:49.0846976Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdC/ 2025-09-07T06:43:49.0847224Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdC/tmp/ 2025-09-07T06:43:49.0848306Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdC/CMakeCCompilerId.c 2025-09-07T06:43:49.0848667Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdC/a.out 2025-09-07T06:43:49.0848943Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeCCompiler.cmake 2025-09-07T06:43:49.0849205Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCXX/ 2025-09-07T06:43:49.0850323Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCXX/tmp/ 2025-09-07T06:43:49.0855503Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-09-07T06:43:49.0855833Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CompilerIdCXX/a.out 2025-09-07T06:43:49.0856118Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeCXXCompiler.cmake 2025-09-07T06:43:49.0856413Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_C.bin 2025-09-07T06:43:49.0856831Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_CXX.bin 2025-09-07T06:43:49.0857101Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeScratch/ 2025-09-07T06:43:49.0857319Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/ 2025-09-07T06:43:49.0857546Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache 2025-09-07T06:43:49.0857790Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/ 2025-09-07T06:43:49.0858054Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts 2025-09-07T06:43:49.0860137Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make 2025-09-07T06:43:49.0860443Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make 2025-09-07T06:43:49.0860711Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt 2025-09-07T06:43:49.0860985Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake 2025-09-07T06:43:49.0861258Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make 2025-09-07T06:43:49.0861533Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake 2025-09-07T06:43:49.0861808Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make 2025-09-07T06:43:49.0862079Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make 2025-09-07T06:43:49.0867905Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d 2025-09-07T06:43:49.0905922Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o 2025-09-07T06:43:49.0907220Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.d 2025-09-07T06:43:49.0907573Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-09-07T06:43:49.0907854Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt 2025-09-07T06:43:49.0913514Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks 2025-09-07T06:43:49.0913906Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2 2025-09-07T06:43:49.0914281Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake 2025-09-07T06:43:49.0914577Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/InstallScripts.json 2025-09-07T06:43:49.0914836Z inflating: build/custom_test_artifacts/jit-hook-build/hipblaslt_test_outer_vec.cc 2025-09-07T06:43:49.0915069Z inflating: build/custom_test_artifacts/jit-hook-build/hipblaslt_test_vec_ext.cc 2025-09-07T06:43:49.0915284Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt 2025-09-07T06:43:49.0915483Z inflating: build/custom_test_artifacts/jit-hook-build/Makefile 2025-09-07T06:43:49.0915689Z inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake 2025-09-07T06:43:49.0933753Z inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks 2025-09-07T06:43:49.0936260Z creating: build/custom_test_artifacts/custom-backend-build/ 2025-09-07T06:43:49.0936904Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/ 2025-09-07T06:43:49.0937327Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/pkgRedirects/ 2025-09-07T06:43:49.0937866Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeConfigureLog.yaml 2025-09-07T06:43:49.0938293Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/ 2025-09-07T06:43:49.0938723Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeSystem.cmake 2025-09-07T06:43:49.0939179Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdC/ 2025-09-07T06:43:49.0939632Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdC/tmp/ 2025-09-07T06:43:49.0940142Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdC/CMakeCCompilerId.c 2025-09-07T06:43:49.0940662Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdC/a.out 2025-09-07T06:43:49.0947978Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeCCompiler.cmake 2025-09-07T06:43:49.0948412Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCXX/ 2025-09-07T06:43:49.0948796Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCXX/tmp/ 2025-09-07T06:43:49.0949253Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-09-07T06:43:49.0949693Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CompilerIdCXX/a.out 2025-09-07T06:43:49.0950095Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeCXXCompiler.cmake 2025-09-07T06:43:49.0950529Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_C.bin 2025-09-07T06:43:49.0950999Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/4.0.0/CMakeDetermineCompilerABI_CXX.bin 2025-09-07T06:43:49.0951404Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeScratch/ 2025-09-07T06:43:49.0952030Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/ 2025-09-07T06:43:49.0952373Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache 2025-09-07T06:43:49.0952740Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/ 2025-09-07T06:43:49.0953155Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts 2025-09-07T06:43:49.0953620Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make 2025-09-07T06:43:49.0955882Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make 2025-09-07T06:43:49.0956226Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt 2025-09-07T06:43:49.0956637Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake 2025-09-07T06:43:49.0956979Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make 2025-09-07T06:43:49.0957402Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake 2025-09-07T06:43:49.0957730Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make 2025-09-07T06:43:49.0958060Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make 2025-09-07T06:43:49.0958413Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d 2025-09-07T06:43:49.1022449Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o 2025-09-07T06:43:49.1022770Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.d 2025-09-07T06:43:49.1023048Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/ 2025-09-07T06:43:49.1025680Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts 2025-09-07T06:43:49.1026019Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make 2025-09-07T06:43:49.1026349Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make 2025-09-07T06:43:49.1026716Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt 2025-09-07T06:43:49.1027032Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake 2025-09-07T06:43:49.1027344Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make 2025-09-07T06:43:49.1027656Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake 2025-09-07T06:43:49.1027974Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make 2025-09-07T06:43:49.1028288Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make 2025-09-07T06:43:49.1037552Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d 2025-09-07T06:43:49.1070364Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o 2025-09-07T06:43:49.1070723Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.d 2025-09-07T06:43:49.1071029Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-09-07T06:43:49.1073465Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt 2025-09-07T06:43:49.1073836Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks 2025-09-07T06:43:49.1074099Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2 2025-09-07T06:43:49.1074337Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake 2025-09-07T06:43:49.1074593Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/InstallScripts.json 2025-09-07T06:43:49.1074849Z inflating: build/custom_test_artifacts/custom-backend-build/hipblaslt_test_outer_vec.cc 2025-09-07T06:43:49.1075094Z inflating: build/custom_test_artifacts/custom-backend-build/hipblaslt_test_vec_ext.cc 2025-09-07T06:43:49.1075323Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt 2025-09-07T06:43:49.1075530Z inflating: build/custom_test_artifacts/custom-backend-build/Makefile 2025-09-07T06:43:49.1075753Z inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake 2025-09-07T06:43:49.1141183Z inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so 2025-09-07T06:43:49.1163106Z inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend 2025-09-07T06:43:49.1163301Z creating: build/lib/ 2025-09-07T06:43:49.1214130Z inflating: build/lib/libprotobuf-lite.a 2025-09-07T06:43:49.1493720Z inflating: build/lib/libprotobuf.a 2025-09-07T06:43:49.1805888Z inflating: build/lib/libprotoc.a 2025-09-07T06:43:49.1811650Z inflating: build/lib/libpthreadpool.a 2025-09-07T06:43:49.1816975Z inflating: build/lib/libcpuinfo.a 2025-09-07T06:43:49.1821342Z inflating: build/lib/libcpuinfo_internals.a 2025-09-07T06:43:49.1821773Z inflating: build/lib/libclog.a 2025-09-07T06:43:49.1833688Z inflating: build/lib/libpytorch_qnnpack.a 2025-09-07T06:43:49.1834665Z inflating: build/lib/libnnpack_reference_layers.a 2025-09-07T06:43:49.1951722Z inflating: build/lib/libmicrokernels-prod.a 2025-09-07T06:43:49.1962780Z inflating: build/lib/libnnpack.a 2025-09-07T06:43:49.2514342Z inflating: build/lib/libmicrokernels-all.a 2025-09-07T06:43:49.2559609Z inflating: build/lib/libgtest.a 2025-09-07T06:43:49.2570248Z inflating: build/lib/libgmock.a 2025-09-07T06:43:49.2570462Z inflating: build/lib/libgtest_main.a 2025-09-07T06:43:49.2570627Z inflating: build/lib/libgmock_main.a 2025-09-07T06:43:49.2626621Z inflating: build/lib/libXNNPACK.a 2025-09-07T06:43:49.2673389Z inflating: build/lib/libbenchmark.a 2025-09-07T06:43:49.2673582Z inflating: build/lib/libbenchmark_main.a 2025-09-07T06:43:49.2714748Z inflating: build/lib/libasmjit.a 2025-09-07T06:43:49.2719595Z inflating: build/lib/libittnotify.a 2025-09-07T06:43:49.3452449Z inflating: build/lib/libfbgemm.a 2025-09-07T06:43:49.3452869Z inflating: build/lib/libjitprofiling.a 2025-09-07T06:43:49.3471229Z inflating: build/lib/libtensorpipe_uv.a 2025-09-07T06:43:49.3814936Z inflating: build/lib/libtensorpipe.a 2025-09-07T06:43:49.3889476Z inflating: build/lib/libgloo.a 2025-09-07T06:43:49.3918658Z inflating: build/lib/libonnx_proto.a 2025-09-07T06:43:49.4179372Z inflating: build/lib/libgloo_hip.a 2025-09-07T06:43:49.4623866Z inflating: build/lib/libonnx.a 2025-09-07T06:43:50.0959964Z inflating: build/lib/libdnnl.a 2025-09-07T06:43:50.0971004Z inflating: build/lib/libfmt.a 2025-09-07T06:43:50.1156346Z inflating: build/lib/libkineto.a 2025-09-07T06:43:50.1225982Z inflating: build/lib/libc10.so 2025-09-07T06:43:50.1226287Z inflating: build/lib/libtorch_global_deps.so 2025-09-07T06:43:50.1226774Z inflating: build/lib/libcaffe2_nvrtc.so 2025-09-07T06:43:50.1261252Z inflating: build/lib/libc10_hip.so 2025-09-07T06:43:50.1618487Z inflating: build/lib/libfbgemm_genai.a 2025-09-07T06:43:52.0402393Z inflating: build/lib/libtorch_cpu.so 2025-09-07T06:43:52.0403401Z inflating: build/lib/libshm.so 2025-09-07T06:43:52.5694000Z inflating: build/lib/libtorch_hip.so 2025-09-07T06:43:52.5694752Z inflating: build/lib/libtorch.so 2025-09-07T06:43:52.5705275Z inflating: build/lib/libjitbackend_test.so 2025-09-07T06:43:52.5758169Z inflating: build/lib/libtorchbind_test.so 2025-09-07T06:43:52.5764033Z inflating: build/lib/libbackend_with_compiler.so 2025-09-07T06:43:52.5780380Z inflating: build/lib/libaoti_custom_ops.so 2025-09-07T06:43:52.7116606Z inflating: build/lib/libtorch_python.so 2025-09-07T06:43:52.7138676Z inflating: build/lib/libnnapi_backend.so 2025-09-07T06:43:52.7139688Z creating: build/bin/ 2025-09-07T06:43:52.7139985Z creating: build/bin/CMakeFiles/ 2025-09-07T06:43:52.7140198Z inflating: build/bin/cmake_install.cmake 2025-09-07T06:43:52.7140409Z inflating: build/bin/CTestTestfile.cmake 2025-09-07T06:43:52.7424663Z inflating: build/bin/protoc-3.13.0.0 2025-09-07T06:43:52.7712473Z inflating: build/bin/protoc 2025-09-07T06:43:52.7748839Z inflating: build/bin/c10_AllocatorConfig_test 2025-09-07T06:43:52.7783155Z inflating: build/bin/c10_CompileTimeFunctionPointer_test 2025-09-07T06:43:52.7819018Z inflating: build/bin/c10_DeviceGuard_test 2025-09-07T06:43:52.7854540Z inflating: build/bin/c10_Device_test 2025-09-07T06:43:52.7895098Z inflating: build/bin/c10_DispatchKeySet_test 2025-09-07T06:43:52.7932442Z inflating: build/bin/c10_Scalar_test 2025-09-07T06:43:52.7968968Z inflating: build/bin/c10_StreamGuard_test 2025-09-07T06:43:52.8007845Z inflating: build/bin/c10_SymInt_test 2025-09-07T06:43:52.8046363Z inflating: build/bin/c10_InlineStreamGuard_test 2025-09-07T06:43:52.8083472Z inflating: build/bin/c10_InlineDeviceGuard_test 2025-09-07T06:43:52.8121751Z inflating: build/bin/c10_SizesAndStrides_test 2025-09-07T06:43:52.8155873Z inflating: build/bin/c10_ArrayRef_test 2025-09-07T06:43:52.8207465Z inflating: build/bin/c10_cow_test 2025-09-07T06:43:52.8239409Z inflating: build/bin/c10_Bitset_test 2025-09-07T06:43:52.8273193Z inflating: build/bin/c10_ConstexprCrc_test 2025-09-07T06:43:52.8307662Z inflating: build/bin/c10_DeadlockDetection_test 2025-09-07T06:43:52.8346460Z inflating: build/bin/c10_Enumerate_test 2025-09-07T06:43:52.8384553Z inflating: build/bin/c10_Half_test 2025-09-07T06:43:52.8420513Z inflating: build/bin/c10_IntrusiveList_test 2025-09-07T06:43:52.8458805Z inflating: build/bin/c10_LeftRight_test 2025-09-07T06:43:52.8496892Z inflating: build/bin/c10_Metaprogramming_test 2025-09-07T06:43:52.8533521Z inflating: build/bin/c10_NetworkFlow_test 2025-09-07T06:43:52.8579098Z inflating: build/bin/c10_ThreadLocal_test 2025-09-07T06:43:52.8605702Z inflating: build/bin/c10_Semaphore_test 2025-09-07T06:43:52.8640220Z inflating: build/bin/c10_Synchronized_test 2025-09-07T06:43:52.8675114Z inflating: build/bin/c10_TypeList_test 2025-09-07T06:43:52.8710977Z inflating: build/bin/c10_TypeIndex_test 2025-09-07T06:43:52.8744925Z inflating: build/bin/c10_TypeTraits_test 2025-09-07T06:43:52.8780590Z inflating: build/bin/c10_accumulate_test 2025-09-07T06:43:52.8821850Z inflating: build/bin/c10_complex_math_test 2025-09-07T06:43:52.8860172Z inflating: build/bin/c10_bfloat16_test 2025-09-07T06:43:52.8898096Z inflating: build/bin/c10_complex_test 2025-09-07T06:43:52.8932807Z inflating: build/bin/c10_bit_cast_test 2025-09-07T06:43:52.8966972Z inflating: build/bin/c10_error_test 2025-09-07T06:43:52.9003326Z inflating: build/bin/c10_exception_test 2025-09-07T06:43:52.9040767Z inflating: build/bin/c10_flags_test 2025-09-07T06:43:52.9072936Z inflating: build/bin/c10_generic_math_test 2025-09-07T06:43:52.9108052Z inflating: build/bin/c10_irange_test 2025-09-07T06:43:52.9217531Z inflating: build/bin/c10_intrusive_ptr_test 2025-09-07T06:43:52.9254186Z inflating: build/bin/c10_lazy_test 2025-09-07T06:43:52.9293629Z inflating: build/bin/c10_logging_test 2025-09-07T06:43:52.9344340Z inflating: build/bin/c10_optional_test 2025-09-07T06:43:52.9386403Z inflating: build/bin/c10_ordered_preserving_dict_test 2025-09-07T06:43:52.9423234Z inflating: build/bin/c10_registry_test 2025-09-07T06:43:52.9523526Z inflating: build/bin/c10_small_vector_test 2025-09-07T06:43:52.9562280Z inflating: build/bin/c10_string_util_test 2025-09-07T06:43:52.9600354Z inflating: build/bin/c10_ssize_test 2025-09-07T06:43:52.9634008Z inflating: build/bin/c10_string_view_test 2025-09-07T06:43:52.9668691Z inflating: build/bin/c10_tempfile_test 2025-09-07T06:43:52.9699016Z inflating: build/bin/c10_intrusive_ptr_benchmark 2025-09-07T06:43:52.9737847Z inflating: build/bin/c10_typeid_test 2025-09-07T06:43:52.9771773Z inflating: build/bin/c10_hip_HIPAssertionsTest_1_var_test 2025-09-07T06:43:52.9805575Z inflating: build/bin/c10_hip_HIPAssertionsTest_catches_stream 2025-09-07T06:43:52.9839749Z inflating: build/bin/c10_hip_HIPAssertionsTest_catches_thread_and_block_and_device 2025-09-07T06:43:52.9873503Z inflating: build/bin/c10_hip_HIPAssertionsTest_from_2_processes 2025-09-07T06:43:52.9907345Z inflating: build/bin/c10_hip_HIPAssertionsTest_multiple_writes_from_blocks_and_threads 2025-09-07T06:43:52.9941248Z inflating: build/bin/c10_hip_HIPAssertionsTest_multiple_writes_from_multiple_blocks 2025-09-07T06:43:52.9975072Z inflating: build/bin/c10_hip_HIPAssertionsTest_multiple_writes_from_same_block 2025-09-07T06:43:53.0012447Z inflating: build/bin/c10_hip_HIPTest 2025-09-07T06:43:53.0388087Z inflating: build/bin/vec_test_all_types_DEFAULT 2025-09-07T06:43:53.0774303Z inflating: build/bin/vec_test_all_types_AVX512 2025-09-07T06:43:53.1164414Z inflating: build/bin/vec_test_all_types_AVX2 2025-09-07T06:43:53.1200087Z inflating: build/bin/BackoffTest 2025-09-07T06:43:53.1236691Z inflating: build/bin/FileStoreTest 2025-09-07T06:43:53.1275377Z inflating: build/bin/TCPStoreTest 2025-09-07T06:43:53.1314026Z inflating: build/bin/HashStoreTest 2025-09-07T06:43:53.1362569Z inflating: build/bin/ProcessGroupGlooTest 2025-09-07T06:43:53.1367378Z inflating: build/bin/example_allreduce 2025-09-07T06:43:53.1367625Z inflating: build/bin/torch_shm_manager 2025-09-07T06:43:53.1402965Z inflating: build/bin/static_runtime_bench 2025-09-07T06:43:53.1565557Z inflating: build/bin/static_runtime_test 2025-09-07T06:43:53.1615787Z inflating: build/bin/Dict_test 2025-09-07T06:43:53.1651830Z inflating: build/bin/Dimname_test 2025-09-07T06:43:53.1696101Z inflating: build/bin/MaybeOwned_test 2025-09-07T06:43:53.1735202Z inflating: build/bin/NamedTensor_test 2025-09-07T06:43:53.1778358Z inflating: build/bin/apply_utils_test 2025-09-07T06:43:53.1818676Z inflating: build/bin/atest 2025-09-07T06:43:53.1863141Z inflating: build/bin/basic 2025-09-07T06:43:53.1900269Z inflating: build/bin/broadcast_test 2025-09-07T06:43:53.1935361Z inflating: build/bin/cpu_allocator_test 2025-09-07T06:43:53.1975128Z inflating: build/bin/cpu_generator_test 2025-09-07T06:43:53.2011691Z inflating: build/bin/cpu_profiling_allocator_test 2025-09-07T06:43:53.2073376Z inflating: build/bin/cpu_rng_test 2025-09-07T06:43:53.2108675Z inflating: build/bin/dlconvertor_test 2025-09-07T06:43:53.2147863Z inflating: build/bin/extension_backend_test 2025-09-07T06:43:53.2188230Z inflating: build/bin/half_test 2025-09-07T06:43:53.2252163Z inflating: build/bin/ivalue_test 2025-09-07T06:43:53.2289214Z inflating: build/bin/math_kernel_test 2025-09-07T06:43:53.2325840Z inflating: build/bin/memory_format_test 2025-09-07T06:43:53.2364748Z inflating: build/bin/lazy_tensor_test 2025-09-07T06:43:53.2397325Z inflating: build/bin/memory_overlapping_test 2025-09-07T06:43:53.2432466Z inflating: build/bin/operator_name_test 2025-09-07T06:43:53.2468942Z inflating: build/bin/mobile_memory_cleanup 2025-09-07T06:43:53.2507421Z inflating: build/bin/native_test 2025-09-07T06:43:53.2543181Z inflating: build/bin/packedtensoraccessor_test 2025-09-07T06:43:53.2578185Z inflating: build/bin/operators_test 2025-09-07T06:43:53.2626571Z inflating: build/bin/pow_test 2025-09-07T06:43:53.2665571Z inflating: build/bin/quantized_test 2025-09-07T06:43:53.2700025Z inflating: build/bin/reduce_ops_test 2025-09-07T06:43:53.2735186Z inflating: build/bin/reportMemoryUsage_test 2025-09-07T06:43:53.2770777Z inflating: build/bin/StorageUtils_test 2025-09-07T06:43:53.2810705Z inflating: build/bin/scalar_test 2025-09-07T06:43:53.2849655Z inflating: build/bin/scalar_tensor_test 2025-09-07T06:43:53.2885380Z inflating: build/bin/stride_properties_test 2025-09-07T06:43:53.2923379Z inflating: build/bin/type_ptr_test 2025-09-07T06:43:53.2976301Z inflating: build/bin/tensor_iterator_test 2025-09-07T06:43:53.3019665Z inflating: build/bin/type_test 2025-09-07T06:43:53.3054399Z inflating: build/bin/thread_init_test 2025-09-07T06:43:53.3091796Z inflating: build/bin/test_parallel 2025-09-07T06:43:53.3128005Z inflating: build/bin/undefined_tensor_test 2025-09-07T06:43:53.3164998Z inflating: build/bin/verify_api_visibility 2025-09-07T06:43:53.3209705Z inflating: build/bin/legacy_vmap_test 2025-09-07T06:43:53.3244829Z inflating: build/bin/weakref_test 2025-09-07T06:43:53.3280277Z inflating: build/bin/wrapdim_test 2025-09-07T06:43:53.3315664Z inflating: build/bin/xla_tensor_test 2025-09-07T06:43:53.3386004Z inflating: build/bin/List_test 2025-09-07T06:43:53.3429643Z inflating: build/bin/IListRef_test 2025-09-07T06:43:53.3492619Z inflating: build/bin/kernel_function_test 2025-09-07T06:43:53.3572040Z inflating: build/bin/kernel_function_legacy_test 2025-09-07T06:43:53.3654308Z inflating: build/bin/kernel_lambda_legacy_test 2025-09-07T06:43:53.3699128Z inflating: build/bin/KernelFunction_test 2025-09-07T06:43:53.3740722Z inflating: build/bin/kernel_stackbased_test 2025-09-07T06:43:53.3810704Z inflating: build/bin/kernel_lambda_test 2025-09-07T06:43:53.3845955Z inflating: build/bin/CppSignature_test 2025-09-07T06:43:53.3909220Z inflating: build/bin/make_boxed_from_unboxed_functor_test 2025-09-07T06:43:53.3942898Z inflating: build/bin/op_allowlist_test 2025-09-07T06:43:53.3980873Z inflating: build/bin/backend_fallback_test 2025-09-07T06:43:53.4026331Z inflating: build/bin/inline_container_test 2025-09-07T06:43:53.4060340Z inflating: build/bin/hip_complex_math_test 2025-09-07T06:43:53.4262047Z inflating: build/bin/op_registration_test 2025-09-07T06:43:53.4295644Z inflating: build/bin/hip_complex_test 2025-09-07T06:43:53.4332176Z inflating: build/bin/hip_apply_test 2025-09-07T06:43:53.4365944Z inflating: build/bin/hip_distributions_test 2025-09-07T06:43:53.4399953Z inflating: build/bin/hip_generator_test 2025-09-07T06:43:53.4434079Z inflating: build/bin/hip_half_test 2025-09-07T06:43:53.4467840Z inflating: build/bin/hip_integer_divider_test 2025-09-07T06:43:53.4501511Z inflating: build/bin/hip_optional_test 2025-09-07T06:43:53.4535596Z inflating: build/bin/hip_packedtensoraccessor_test 2025-09-07T06:43:53.4574511Z inflating: build/bin/hip_dlconvertor_test 2025-09-07T06:43:53.4608033Z inflating: build/bin/hip_vectorized_test 2025-09-07T06:43:53.5312418Z inflating: build/bin/test_jit 2025-09-07T06:43:53.5555633Z inflating: build/bin/test_nativert 2025-09-07T06:43:53.5593710Z inflating: build/bin/test_dist_autograd 2025-09-07T06:43:53.5639571Z inflating: build/bin/test_cpp_rpc 2025-09-07T06:43:53.5644093Z inflating: build/bin/parallel_benchmark 2025-09-07T06:43:53.6378082Z inflating: build/bin/test_api 2025-09-07T06:43:53.6599821Z inflating: build/bin/test_lazy 2025-09-07T06:43:53.6600208Z creating: .additional_ci_files/ 2025-09-07T06:43:53.6655625Z inflating: .additional_ci_files/test-times.json 2025-09-07T06:43:53.6871672Z inflating: .additional_ci_files/test-class-times.json 2025-09-07T06:43:53.6909506Z ##[group]Run rm artifacts.zip 2025-09-07T06:43:53.6909657Z rm artifacts.zip 2025-09-07T06:43:53.6915749Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:43:53.6915895Z env: 2025-09-07T06:43:53.6915983Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:43:53.6916113Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:43:53.6916284Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:43:53.6916445Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:43:53.6919194Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:43:53.6919571Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:43:53.6919680Z AWS_REGION: us-east-1 2025-09-07T06:43:53.6919877Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:43:53.6920089Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:43:53.6922147Z AWS_SESSION_TOKEN: *** 2025-09-07T06:43:53.6922246Z ##[endgroup] 2025-09-07T06:43:53.8025570Z ##[group]Run df -H 2025-09-07T06:43:53.8025684Z df -H 2025-09-07T06:43:53.8029661Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:43:53.8029810Z env: 2025-09-07T06:43:53.8029906Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:43:53.8030043Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:43:53.8030219Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:43:53.8030385Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:43:53.8030767Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:43:53.8031281Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:43:53.8031396Z AWS_REGION: us-east-1 2025-09-07T06:43:53.8031542Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:43:53.8031700Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:43:53.8033786Z AWS_SESSION_TOKEN: *** 2025-09-07T06:43:53.8033901Z ##[endgroup] 2025-09-07T06:43:53.8211956Z Filesystem Size Used Avail Use% Mounted on 2025-09-07T06:43:53.8212247Z overlay 2.2T 376G 1.8T 18% / 2025-09-07T06:43:53.8212500Z tmpfs 68M 0 68M 0% /dev 2025-09-07T06:43:53.8212724Z /dev/vda1 2.2T 376G 1.8T 18% /run 2025-09-07T06:43:53.8212948Z shm 68M 4.1k 68M 1% /dev/shm 2025-09-07T06:43:53.8213336Z tmpfs 1.4T 13k 1.4T 1% /run/secrets/kubernetes.io/serviceaccount 2025-09-07T06:43:53.8213723Z tmpfs 675G 0 675G 0% /proc/acpi 2025-09-07T06:43:53.8213974Z tmpfs 675G 0 675G 0% /proc/scsi 2025-09-07T06:43:53.8214202Z tmpfs 675G 0 675G 0% /sys/firmware 2025-09-07T06:43:53.8249444Z Prepare all required actions 2025-09-07T06:43:53.8249640Z Getting action download info 2025-09-07T06:43:53.9904291Z ##[group]Run ./.github/actions/download-td-artifacts 2025-09-07T06:43:53.9904452Z with: 2025-09-07T06:43:53.9904541Z env: 2025-09-07T06:43:53.9904636Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:43:53.9907410Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:43:53.9907590Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:43:53.9907751Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:43:53.9908197Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:43:53.9908583Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:43:53.9908724Z AWS_REGION: us-east-1 2025-09-07T06:43:53.9908909Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:43:53.9909076Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:43:53.9911170Z AWS_SESSION_TOKEN: *** 2025-09-07T06:43:53.9911270Z ##[endgroup] 2025-09-07T06:43:53.9945875Z ##[group]Run seemethere/download-artifact-s3@v4 2025-09-07T06:43:53.9946023Z with: 2025-09-07T06:43:53.9946111Z name: td_results 2025-09-07T06:43:53.9946209Z s3-bucket: gha-artifacts 2025-09-07T06:43:53.9946312Z region: us-east-1 2025-09-07T06:43:53.9946401Z env: 2025-09-07T06:43:53.9946605Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:43:53.9946735Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:43:53.9946911Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:43:53.9947071Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:43:53.9947448Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:43:53.9950615Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:43:53.9950741Z AWS_REGION: us-east-1 2025-09-07T06:43:53.9950876Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:43:53.9971760Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:43:53.9973920Z AWS_SESSION_TOKEN: *** 2025-09-07T06:43:53.9974038Z ##[endgroup] 2025-09-07T06:43:54.2298828Z (node:4954) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-09-07T06:43:54.2299620Z 2025-09-07T06:43:54.2300194Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-09-07T06:43:54.2300572Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-09-07T06:43:54.2300872Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-09-07T06:43:55.3395671Z Found 1 objects with prefix pytorch/pytorch/17524754565/td_results/ 2025-09-07T06:43:55.3397044Z Starting download (1/1): /home/runner/_work/pytorch/pytorch/td_results.json 2025-09-07T06:43:55.5054726Z Finished download (1/1): /home/runner/_work/pytorch/pytorch/td_results.json 2025-09-07T06:43:55.5067636Z Artifact download has finished successfully 2025-09-07T06:43:55.5227492Z ##[group]Run mkdir -p .additional_ci_files 2025-09-07T06:43:55.5227703Z mkdir -p .additional_ci_files 2025-09-07T06:43:55.5227914Z mv td_results.json .additional_ci_files/td_results.json || true 2025-09-07T06:43:55.5233962Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:43:55.5234115Z env: 2025-09-07T06:43:55.5234208Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:43:55.5234342Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:43:55.5234523Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:43:55.5234693Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:43:55.5235244Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:43:55.5235641Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:43:55.5235763Z AWS_REGION: us-east-1 2025-09-07T06:43:55.5235924Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:43:55.5242730Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:43:55.5244814Z AWS_SESSION_TOKEN: *** 2025-09-07T06:43:55.5244917Z ##[endgroup] 2025-09-07T06:43:55.5403401Z ##[group]Run .github/scripts/parse_ref.py 2025-09-07T06:43:55.5403567Z .github/scripts/parse_ref.py 2025-09-07T06:43:55.5412720Z shell: /usr/bin/bash -e {0} 2025-09-07T06:43:55.5412844Z env: 2025-09-07T06:43:55.5412941Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:43:55.5413087Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:43:55.5416454Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:43:55.5416747Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:43:55.5417121Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:43:55.5417482Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:43:55.5417594Z AWS_REGION: us-east-1 2025-09-07T06:43:55.5417771Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:43:55.5417936Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:43:55.5420021Z AWS_SESSION_TOKEN: *** 2025-09-07T06:43:55.5420123Z ##[endgroup] 2025-09-07T06:43:55.5693372Z Setting output branch=main 2025-09-07T06:43:55.5767276Z Prepare all required actions 2025-09-07T06:43:55.5767488Z Getting action download info 2025-09-07T06:43:55.7133309Z ##[group]Run ./.github/actions/filter-test-configs 2025-09-07T06:43:55.7133446Z with: 2025-09-07T06:43:55.7133684Z github-token: *** 2025-09-07T06:43:55.7134274Z test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}]} 2025-09-07T06:43:55.7134963Z job-name: linux-noble-rocm-py3.12-mi300 / test (default, 6, 6, linux.rocm.gpu.gfx942.1) 2025-09-07T06:43:55.7135160Z env: 2025-09-07T06:43:55.7135248Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:43:55.7135381Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:43:55.7135549Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:43:55.7135709Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:43:55.7136217Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:43:55.7136660Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:43:55.7136773Z AWS_REGION: us-east-1 2025-09-07T06:43:55.7136890Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:43:55.7137035Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:43:55.7141816Z AWS_SESSION_TOKEN: *** 2025-09-07T06:43:55.7141930Z ##[endgroup] 2025-09-07T06:43:55.7166027Z ##[group]Run nick-fields/retry@v3.0.0 2025-09-07T06:43:55.7166146Z with: 2025-09-07T06:43:55.7166231Z shell: bash 2025-09-07T06:43:55.7166316Z timeout_minutes: 10 2025-09-07T06:43:55.7166410Z max_attempts: 5 2025-09-07T06:43:55.7166618Z retry_wait_seconds: 30 2025-09-07T06:43:55.7166904Z command: set -eux # PyYAML 6.0 doesn't work with MacOS x86 anymore # This must run on Python-3.7 (AmazonLinux2) so can't use request=3.32.2 python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-09-07T06:43:55.7167201Z polling_interval_seconds: 1 2025-09-07T06:43:55.7167310Z warning_on_retry: true 2025-09-07T06:43:55.7167408Z continue_on_error: false 2025-09-07T06:43:55.7169635Z env: 2025-09-07T06:43:55.7169725Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:43:55.7169854Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:43:55.7170023Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:43:55.7170180Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:43:55.7170551Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:43:55.7170909Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:43:55.7171017Z AWS_REGION: us-east-1 2025-09-07T06:43:55.7171144Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:43:55.7171291Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:43:55.7173350Z AWS_SESSION_TOKEN: *** 2025-09-07T06:43:55.7175503Z GITHUB_TOKEN: *** 2025-09-07T06:43:55.7175595Z ##[endgroup] 2025-09-07T06:43:55.7574908Z + python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-09-07T06:43:55.8997130Z Defaulting to user installation because normal site-packages is not writeable 2025-09-07T06:43:56.1541992Z Collecting requests==2.27.1 2025-09-07T06:43:56.2598403Z Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB) 2025-09-07T06:43:56.4763516Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.1/63.1 KB 246.7 kB/s eta 0:00:00 2025-09-07T06:43:56.6139914Z Collecting pyyaml==6.0.2 2025-09-07T06:43:56.6406234Z Downloading PyYAML-6.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (751 kB) 2025-09-07T06:43:56.7963473Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 751.2/751.2 KB 4.9 MB/s eta 0:00:00 2025-09-07T06:43:56.9644418Z Collecting idna<4,>=2.5 2025-09-07T06:43:56.9905931Z Downloading idna-3.10-py3-none-any.whl (70 kB) 2025-09-07T06:43:57.0648259Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 70.4/70.4 KB 838.1 kB/s eta 0:00:00 2025-09-07T06:43:57.3092601Z Collecting charset-normalizer~=2.0.0 2025-09-07T06:43:57.3364432Z Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB) 2025-09-07T06:43:57.8093372Z Collecting certifi>=2017.4.17 2025-09-07T06:43:57.8358006Z Downloading certifi-2025.8.3-py3-none-any.whl (161 kB) 2025-09-07T06:43:57.9455177Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 161.2/161.2 KB 1.4 MB/s eta 0:00:00 2025-09-07T06:43:58.0367812Z Collecting urllib3<1.27,>=1.21.1 2025-09-07T06:43:58.0629700Z Downloading urllib3-1.26.20-py2.py3-none-any.whl (144 kB) 2025-09-07T06:43:58.1099053Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 144.2/144.2 KB 2.9 MB/s eta 0:00:00 2025-09-07T06:43:58.1608451Z Installing collected packages: urllib3, pyyaml, idna, charset-normalizer, certifi, requests 2025-09-07T06:43:58.9334698Z WARNING: The script normalizer is installed in '/home/runner/.local/bin' which is not on PATH. 2025-09-07T06:43:58.9336258Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2025-09-07T06:43:59.2866596Z Successfully installed certifi-2025.8.3 charset-normalizer-2.0.12 idna-3.10 pyyaml-6.0.2 requests-2.27.1 urllib3-1.26.20 2025-09-07T06:43:59.7612385Z Command completed after 1 attempt(s). 2025-09-07T06:43:59.7681960Z ##[group]Run set -x 2025-09-07T06:43:59.7682080Z set -x 2025-09-07T06:43:59.7682168Z  2025-09-07T06:43:59.7682314Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-09-07T06:43:59.7684188Z # in runner workspace 2025-09-07T06:43:59.7684344Z python3 "${GITHUB_ACTION_PATH}/../../scripts/parse_ref.py" 2025-09-07T06:43:59.7689918Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:43:59.7690056Z env: 2025-09-07T06:43:59.7690143Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:43:59.7690290Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:43:59.7690456Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:43:59.7690611Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:43:59.7690978Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:43:59.7691338Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:43:59.7693154Z AWS_REGION: us-east-1 2025-09-07T06:43:59.7693313Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:43:59.7693460Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:43:59.7695513Z AWS_SESSION_TOKEN: *** 2025-09-07T06:43:59.7695617Z ##[endgroup] 2025-09-07T06:43:59.7724040Z + python3 /home/runner/_work/pytorch/pytorch/./.github/actions/filter-test-configs/../../scripts/parse_ref.py 2025-09-07T06:43:59.7819349Z Setting output branch=main 2025-09-07T06:43:59.7847742Z ##[group]Run echo "Workflow: ${GITHUB_WORKFLOW}" 2025-09-07T06:43:59.7847905Z echo "Workflow: ${GITHUB_WORKFLOW}" 2025-09-07T06:43:59.7848032Z echo "Job name: ${JOB_NAME}" 2025-09-07T06:43:59.7848144Z  2025-09-07T06:43:59.7848287Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-09-07T06:43:59.7850305Z # in runner workspace 2025-09-07T06:43:59.7850467Z python3 "${GITHUB_ACTION_PATH}/../../scripts/filter_test_configs.py" \ 2025-09-07T06:43:59.7850641Z  --workflow "${GITHUB_WORKFLOW}" \ 2025-09-07T06:43:59.7850767Z  --job-name "${JOB_NAME}" \ 2025-09-07T06:43:59.7851537Z  --test-matrix "{"include": [{"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}]}" \ 2025-09-07T06:43:59.7852164Z  --selected-test-configs "" \ 2025-09-07T06:43:59.7852288Z  --pr-number "${PR_NUMBER}" \ 2025-09-07T06:43:59.7852407Z  --tag "${TAG}" \ 2025-09-07T06:43:59.7852516Z  --event-name "${EVENT_NAME}" \ 2025-09-07T06:43:59.7852633Z  --schedule "${SCHEDULE}" \ 2025-09-07T06:43:59.7852747Z  --branch "${HEAD_BRANCH}" 2025-09-07T06:43:59.7858419Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:43:59.7858555Z env: 2025-09-07T06:43:59.7858641Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:43:59.7858768Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:43:59.7858934Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:43:59.7859204Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:43:59.7859568Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:43:59.7859923Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:43:59.7860032Z AWS_REGION: us-east-1 2025-09-07T06:43:59.7861901Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:43:59.7862054Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:43:59.7864115Z AWS_SESSION_TOKEN: *** 2025-09-07T06:43:59.7864307Z GITHUB_TOKEN: *** 2025-09-07T06:43:59.7864476Z JOB_NAME: linux-noble-rocm-py3.12-mi300 / test (default, 6, 6, linux.rocm.gpu.gfx942.1) 2025-09-07T06:43:59.7864653Z PR_NUMBER: 2025-09-07T06:43:59.7864736Z TAG: 2025-09-07T06:43:59.7864815Z EVENT_NAME: push 2025-09-07T06:43:59.7864903Z SCHEDULE: 2025-09-07T06:43:59.7864990Z HEAD_BRANCH: main 2025-09-07T06:43:59.7865078Z ##[endgroup] 2025-09-07T06:43:59.7891196Z Workflow: rocm-mi300 2025-09-07T06:43:59.7891661Z Job name: linux-noble-rocm-py3.12-mi300 / test (default, 6, 6, linux.rocm.gpu.gfx942.1) 2025-09-07T06:44:00.0283375Z Setting output keep-going=True 2025-09-07T06:44:00.0283863Z Setting output ci-verbose-test-logs=False 2025-09-07T06:44:00.0284255Z Setting output ci-test-showlocals=False 2025-09-07T06:44:00.0284613Z Setting output ci-no-test-timeout=False 2025-09-07T06:44:00.0284950Z Setting output ci-no-td=False 2025-09-07T06:44:00.0285274Z Setting output ci-td-distributed=False 2025-09-07T06:44:00.0285627Z Setting output is-unstable=False 2025-09-07T06:44:00.0286018Z Setting output reenabled-issues= 2025-09-07T06:44:00.0288085Z Setting output test-matrix={"include": [{"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}]} 2025-09-07T06:44:00.0299238Z Setting output is-test-matrix-empty=False 2025-09-07T06:44:00.0346201Z ##[group]Run echo "Filtered matrix:" 2025-09-07T06:44:00.0346408Z echo "Filtered matrix:" 2025-09-07T06:44:00.0347561Z echo "{"include": [{"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1"}]}" 2025-09-07T06:44:00.0348384Z  2025-09-07T06:44:00.0348502Z echo 2025-09-07T06:44:00.0352719Z echo "Is the current job unstable? False" 2025-09-07T06:44:00.0352862Z  2025-09-07T06:44:00.0352950Z echo 2025-09-07T06:44:00.0353065Z echo "Is keep-going label set? True" 2025-09-07T06:44:00.0353188Z  2025-09-07T06:44:00.0353271Z echo 2025-09-07T06:44:00.0353368Z echo "Reenabled issues? " 2025-09-07T06:44:00.0357792Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:44:00.0357941Z env: 2025-09-07T06:44:00.0358040Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:44:00.0358182Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:44:00.0358361Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:44:00.0360601Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:44:00.0361080Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:44:00.0361436Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:44:00.0361548Z AWS_REGION: us-east-1 2025-09-07T06:44:00.0361673Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:44:00.0361819Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:44:00.0363923Z AWS_SESSION_TOKEN: *** 2025-09-07T06:44:00.0364023Z ##[endgroup] 2025-09-07T06:44:00.0384035Z Filtered matrix: 2025-09-07T06:44:00.0384861Z {include: [{config: default, shard: 1, num_shards: 6, runner: linux.rocm.gpu.gfx942.1}, {config: default, shard: 2, num_shards: 6, runner: linux.rocm.gpu.gfx942.1}, {config: default, shard: 3, num_shards: 6, runner: linux.rocm.gpu.gfx942.1}, {config: default, shard: 4, num_shards: 6, runner: linux.rocm.gpu.gfx942.1}, {config: default, shard: 5, num_shards: 6, runner: linux.rocm.gpu.gfx942.1}, {config: default, shard: 6, num_shards: 6, runner: linux.rocm.gpu.gfx942.1}]} 2025-09-07T06:44:00.0385531Z 2025-09-07T06:44:00.0385607Z Is the current job unstable? False 2025-09-07T06:44:00.0385697Z 2025-09-07T06:44:00.0385746Z Is keep-going label set? True 2025-09-07T06:44:00.0385824Z 2025-09-07T06:44:00.0385865Z Reenabled issues? 2025-09-07T06:44:00.0419030Z ##[group]Run echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-09-07T06:44:00.0419235Z echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-09-07T06:44:00.0424981Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:44:00.0425125Z env: 2025-09-07T06:44:00.0425215Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:44:00.0425342Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:44:00.0425511Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:44:00.0425669Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:44:00.0426047Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:44:00.0426405Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:44:00.0426631Z AWS_REGION: us-east-1 2025-09-07T06:44:00.0426833Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:44:00.0426991Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:44:00.0429058Z AWS_SESSION_TOKEN: *** 2025-09-07T06:44:00.0429171Z JOB_TIMEOUT: 300 2025-09-07T06:44:00.0429274Z ##[endgroup] 2025-09-07T06:44:00.0501087Z ##[group]Run set -x 2025-09-07T06:44:00.0501224Z set -x 2025-09-07T06:44:00.0501312Z  2025-09-07T06:44:00.0501409Z if [[ $TEST_CONFIG == 'multigpu' ]]; then 2025-09-07T06:44:00.0501560Z  TEST_COMMAND=.ci/pytorch/multigpu-test.sh 2025-09-07T06:44:00.0501710Z elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then 2025-09-07T06:44:00.0501870Z  TEST_COMMAND=.ci/caffe2/test.sh 2025-09-07T06:44:00.0501983Z else 2025-09-07T06:44:00.0502081Z  TEST_COMMAND=.ci/pytorch/test.sh 2025-09-07T06:44:00.0502194Z fi 2025-09-07T06:44:00.0502273Z  2025-09-07T06:44:00.0502397Z # detached container should get cleaned up by teardown_ec2_linux 2025-09-07T06:44:00.0502591Z # TODO: Stop building test binaries as part of the build phase 2025-09-07T06:44:00.0502762Z # Used for GPU_FLAG since that doesn't play nice 2025-09-07T06:44:00.0502920Z # shellcheck disable=SC2086,SC2090 2025-09-07T06:44:00.0505131Z container_name=$(docker run \ 2025-09-07T06:44:00.0505252Z  ${GPU_FLAG:-} \ 2025-09-07T06:44:00.0505361Z  -e BUILD_ENVIRONMENT \ 2025-09-07T06:44:00.0505476Z  -e PR_NUMBER \ 2025-09-07T06:44:00.0505579Z  -e GITHUB_ACTIONS \ 2025-09-07T06:44:00.0505687Z  -e GITHUB_REPOSITORY \ 2025-09-07T06:44:00.0505935Z  -e GITHUB_WORKFLOW \ 2025-09-07T06:44:00.0506044Z  -e GITHUB_JOB \ 2025-09-07T06:44:00.0506144Z  -e GITHUB_RUN_ID \ 2025-09-07T06:44:00.0506248Z  -e GITHUB_RUN_NUMBER \ 2025-09-07T06:44:00.0506358Z  -e GITHUB_RUN_ATTEMPT \ 2025-09-07T06:44:00.0506470Z  -e JOB_ID \ 2025-09-07T06:44:00.0506671Z  -e JOB_NAME \ 2025-09-07T06:44:00.0506769Z  -e BRANCH \ 2025-09-07T06:44:00.0506861Z  -e SHA1 \ 2025-09-07T06:44:00.0506959Z  -e AWS_DEFAULT_REGION \ 2025-09-07T06:44:00.0507071Z  -e IN_WHEEL_TEST \ 2025-09-07T06:44:00.0507175Z  -e SHARD_NUMBER \ 2025-09-07T06:44:00.0507276Z  -e TEST_CONFIG \ 2025-09-07T06:44:00.0507379Z  -e NUM_TEST_SHARDS \ 2025-09-07T06:44:00.0507487Z  -e REENABLED_ISSUES \ 2025-09-07T06:44:00.0507600Z  -e CONTINUE_THROUGH_ERROR \ 2025-09-07T06:44:00.0507714Z  -e VERBOSE_TEST_LOGS \ 2025-09-07T06:44:00.0507827Z  -e TEST_SHOWLOCALS \ 2025-09-07T06:44:00.0507932Z  -e NO_TEST_TIMEOUT \ 2025-09-07T06:44:00.0509644Z  -e NO_TD \ 2025-09-07T06:44:00.0509753Z  -e MAX_JOBS="$(nproc --ignore=2)" \ 2025-09-07T06:44:00.0509889Z  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \ 2025-09-07T06:44:00.0510021Z  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \ 2025-09-07T06:44:00.0510145Z  -e TESTS_TO_INCLUDE \ 2025-09-07T06:44:00.0510253Z  -e DASHBOARD_TAG \ 2025-09-07T06:44:00.0510393Z  --env-file="${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" \ 2025-09-07T06:44:00.0510545Z  --ulimit stack=10485760:83886080 \ 2025-09-07T06:44:00.0510661Z  --ulimit core=0 \ 2025-09-07T06:44:00.0510775Z  --security-opt seccomp=unconfined \ 2025-09-07T06:44:00.0510903Z  --cap-add=SYS_PTRACE \ 2025-09-07T06:44:00.0512431Z  --shm-size="8g" \ 2025-09-07T06:44:00.0512529Z  --tty \ 2025-09-07T06:44:00.0512623Z  --detach \ 2025-09-07T06:44:00.0512726Z  --name="${container_name}" \ 2025-09-07T06:44:00.0512840Z  --user jenkins \ 2025-09-07T06:44:00.0512970Z  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ 2025-09-07T06:44:00.0513115Z  -w /var/lib/jenkins/workspace \ 2025-09-07T06:44:00.0513233Z  "${DOCKER_IMAGE}" 2025-09-07T06:44:00.0513329Z ) 2025-09-07T06:44:00.0513423Z # save container name for later step 2025-09-07T06:44:00.0515013Z echo "CONTAINER_NAME=${container_name}" >> "$GITHUB_ENV" 2025-09-07T06:44:00.0515283Z # jenkins user does not have write permission to mounted workspace; work-around by copying within container to jenkins home 2025-09-07T06:44:00.0515619Z docker exec -t "${container_name}" sh -c "cd .. && cp -R workspace pytorch && cd pytorch && pip install dist/*.whl && ${TEST_COMMAND}" 2025-09-07T06:44:00.0521236Z shell: /usr/bin/bash -e {0} 2025-09-07T06:44:00.0521343Z env: 2025-09-07T06:44:00.0521427Z GIT_DEFAULT_BRANCH: main 2025-09-07T06:44:00.0521557Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T06:44:00.0521728Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T06:44:00.0521884Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T06:44:00.0522256Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T06:44:00.0524366Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:44:00.0524476Z AWS_REGION: us-east-1 2025-09-07T06:44:00.0524628Z AWS_ACCESS_KEY_ID: *** 2025-09-07T06:44:00.0524775Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T06:44:00.0526939Z AWS_SESSION_TOKEN: *** 2025-09-07T06:44:00.0527061Z BUILD_ENVIRONMENT: linux-noble-rocm-py3.12-mi300 2025-09-07T06:44:00.0527191Z PR_NUMBER: 2025-09-07T06:44:00.0527382Z GITHUB_REPOSITORY: pytorch/pytorch 2025-09-07T06:44:00.0527503Z GITHUB_WORKFLOW: rocm-mi300 2025-09-07T06:44:00.0527606Z GITHUB_JOB: test 2025-09-07T06:44:00.0527696Z GITHUB_RUN_ID: 17524754565 2025-09-07T06:44:00.0527800Z GITHUB_RUN_NUMBER: 9398 2025-09-07T06:44:00.0527899Z GITHUB_RUN_ATTEMPT: 1 2025-09-07T06:44:00.0527992Z JOB_ID: 49774353529 2025-09-07T06:44:00.0528154Z JOB_NAME: linux-noble-rocm-py3.12-mi300 / test (default, 6, 6, linux.rocm.gpu.gfx942.1) 2025-09-07T06:44:00.0528332Z BRANCH: main 2025-09-07T06:44:00.0529945Z SHA1: 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:44:00.0530073Z CONTINUE_THROUGH_ERROR: True 2025-09-07T06:44:00.0530180Z VERBOSE_TEST_LOGS: False 2025-09-07T06:44:00.0530282Z TEST_SHOWLOCALS: False 2025-09-07T06:44:00.0530380Z NO_TEST_TIMEOUT: False 2025-09-07T06:44:00.0530473Z NO_TD: False 2025-09-07T06:44:00.0530556Z TEST_CONFIG: default 2025-09-07T06:44:00.0530647Z SHARD_NUMBER: 6 2025-09-07T06:44:00.0530734Z NUM_TEST_SHARDS: 6 2025-09-07T06:44:00.0530829Z REENABLED_ISSUES: 2025-09-07T06:44:00.0531091Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:44:00.0532731Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 0 2025-09-07T06:44:00.0532855Z PYTORCH_TEST_RERUN_DISABLED_TESTS: 0 2025-09-07T06:44:00.0532969Z TESTS_TO_INCLUDE: 2025-09-07T06:44:00.0533058Z DASHBOARD_TAG: 2025-09-07T06:44:00.0533146Z ##[endgroup] 2025-09-07T06:44:00.0556356Z + [[ default == \m\u\l\t\i\g\p\u ]] 2025-09-07T06:44:00.0556993Z + [[ linux-noble-rocm-py3.12-mi300 == *onnx* ]] 2025-09-07T06:44:00.0557202Z + TEST_COMMAND=.ci/pytorch/test.sh 2025-09-07T06:44:00.0565778Z +++ nproc --ignore=2 2025-09-07T06:44:00.0577536Z ++ docker run --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e GITHUB_REPOSITORY -e GITHUB_WORKFLOW -e GITHUB_JOB -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e JOB_ID -e JOB_NAME -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e REENABLED_ISSUES -e CONTINUE_THROUGH_ERROR -e VERBOSE_TEST_LOGS -e TEST_SHOWLOCALS -e NO_TEST_TIMEOUT -e NO_TD -e MAX_JOBS=158 -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS -e TESTS_TO_INCLUDE -e DASHBOARD_TAG --env-file=/home/runner/_work/_temp/github_env_17524754565 --ulimit stack=10485760:83886080 --ulimit core=0 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --shm-size=8g --tty --detach --name= --user jenkins -v /home/runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-ae53c6842aa4c2407d0ad976491ca941c2635c77 2025-09-07T06:44:05.4205034Z + container_name=a39ab74a215f91c588e65e428af6eb54b050b7235261f1f4cb5d995fd65de40f 2025-09-07T06:44:05.4215887Z + echo CONTAINER_NAME=a39ab74a215f91c588e65e428af6eb54b050b7235261f1f4cb5d995fd65de40f 2025-09-07T06:44:05.4216356Z + docker exec -t a39ab74a215f91c588e65e428af6eb54b050b7235261f1f4cb5d995fd65de40f sh -c 'cd .. && cp -R workspace pytorch && cd pytorch && pip install dist/*.whl && .ci/pytorch/test.sh' 2025-09-07T06:44:09.7868473Z Processing ./dist/torch-2.9.0a0+git93fb23d-cp312-cp312-linux_x86_64.whl 2025-09-07T06:44:10.0734447Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch==2.9.0a0+git93fb23d) (3.19.1) 2025-09-07T06:44:10.0735635Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch==2.9.0a0+git93fb23d) (4.15.0) 2025-09-07T06:44:10.0739063Z Requirement already satisfied: setuptools in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch==2.9.0a0+git93fb23d) (80.9.0) 2025-09-07T06:44:10.0740595Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch==2.9.0a0+git93fb23d) (1.13.3) 2025-09-07T06:44:10.0741644Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch==2.9.0a0+git93fb23d) (2.8.8) 2025-09-07T06:44:10.0742455Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch==2.9.0a0+git93fb23d) (3.1.6) 2025-09-07T06:44:10.0748758Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch==2.9.0a0+git93fb23d) (2025.7.0) 2025-09-07T06:44:10.0790886Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from sympy>=1.13.3->torch==2.9.0a0+git93fb23d) (1.3.0) 2025-09-07T06:44:10.0805227Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from jinja2->torch==2.9.0a0+git93fb23d) (3.0.2) 2025-09-07T06:44:10.1860054Z Installing collected packages: torch 2025-09-07T06:44:15.2920717Z ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. 2025-09-07T06:44:15.2921267Z helion 0.1.3 requires filecheck, which is not installed. 2025-09-07T06:44:15.2921485Z Successfully installed torch-2.9.0a0+git93fb23d 2025-09-07T06:44:15.3308914Z + export TERM=vt100 2025-09-07T06:44:15.3309122Z + TERM=vt100 2025-09-07T06:44:15.3309281Z ++ dirname .ci/pytorch/test.sh 2025-09-07T06:44:15.3317893Z + source .ci/pytorch/common.sh 2025-09-07T06:44:15.3320801Z +++ dirname .ci/pytorch/common.sh 2025-09-07T06:44:15.3325117Z ++ source .ci/pytorch/common_utils.sh 2025-09-07T06:44:15.3325265Z +++ declare -f -t trap_add 2025-09-07T06:44:15.3327737Z ++ set -ex -o pipefail 2025-09-07T06:44:15.3328024Z ++ [[ linux-noble-rocm-py3.12-mi300 == *rocm* ]] 2025-09-07T06:44:15.3328202Z ++ unset HIP_PLATFORM 2025-09-07T06:44:15.3328319Z ++ export PYTORCH_TEST_WITH_ROCM=1 2025-09-07T06:44:15.3331267Z ++ PYTORCH_TEST_WITH_ROCM=1 2025-09-07T06:44:15.3331393Z ++ BUILD_TEST_LIBTORCH=0 2025-09-07T06:44:15.3331888Z ++ dirname .ci/pytorch/test.sh 2025-09-07T06:44:15.3336212Z + source .ci/pytorch/common-build.sh 2025-09-07T06:44:15.3336588Z ++ [[ linux-noble-rocm-py3.12-mi300 != *win-* ]] 2025-09-07T06:44:15.3345020Z ++++ dirname .ci/pytorch/common-build.sh 2025-09-07T06:44:15.3356115Z +++ cd .ci/pytorch 2025-09-07T06:44:15.3356360Z +++ pwd -P 2025-09-07T06:44:15.3360143Z ++ script_dir=/var/lib/jenkins/pytorch/.ci/pytorch 2025-09-07T06:44:15.3360381Z ++ [[ linux-noble-rocm-py3.12-mi300 == *-pch* ]] 2025-09-07T06:44:15.3365572Z ++ which sccache 2025-09-07T06:44:15.3373613Z ++ [[ -z '' ]] 2025-09-07T06:44:15.3373779Z ++ unset SCCACHE_BUCKET 2025-09-07T06:44:15.3375265Z ++ unset SCCACHE_REGION 2025-09-07T06:44:15.3375634Z ++ sccache --stop-server 2025-09-07T06:44:15.3398358Z ++ true 2025-09-07T06:44:15.3398631Z ++ rm -f /var/lib/jenkins/sccache_error.log 2025-09-07T06:44:15.3405626Z ++ trap_add sccache_epilogue EXIT 2025-09-07T06:44:15.3405887Z ++ trap_add_cmd=sccache_epilogue 2025-09-07T06:44:15.3406089Z ++ shift 2025-09-07T06:44:15.3406257Z ++ for trap_add_name in "$@" 2025-09-07T06:44:15.3415868Z ++++ trap -p EXIT 2025-09-07T06:44:15.3423656Z +++ eval 'extract_trap_cmd ' 2025-09-07T06:44:15.3424293Z ++++ extract_trap_cmd 2025-09-07T06:44:15.3424430Z ++++ printf '%s\n' '' 2025-09-07T06:44:15.3424832Z +++ printf '%s\n' sccache_epilogue 2025-09-07T06:44:15.3425014Z ++ trap -- ' 2025-09-07T06:44:15.3425160Z sccache_epilogue' EXIT 2025-09-07T06:44:15.3425320Z ++ [[ -n '' ]] 2025-09-07T06:44:15.3425474Z ++ [[ linux-noble-rocm-py3.12-mi300 == *rocm* ]] 2025-09-07T06:44:15.3425698Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-09-07T06:44:15.3425898Z ++ SCCACHE_IDLE_TIMEOUT=0 2025-09-07T06:44:15.3426040Z ++ sccache --start-server 2025-09-07T06:44:15.3438959Z sccache: Starting the server... 2025-09-07T06:44:15.3681194Z sccache: Listening on address 127.0.0.1:4226 2025-09-07T06:44:15.3687361Z ++ sccache --zero-stats 2025-09-07T06:44:15.3704231Z Statistics zeroed. 2025-09-07T06:44:15.3706277Z ++ which ccache 2025-09-07T06:44:15.3714093Z + [[ linux-noble-rocm-py3.12-mi300 != *rocm* ]] 2025-09-07T06:44:15.3715197Z + echo 'Environment variables:' 2025-09-07T06:44:15.3715438Z Environment variables: 2025-09-07T06:44:15.3715553Z + env 2025-09-07T06:44:15.3724987Z GITHUB_WORKSPACE=/home/runner/_work/pytorch/pytorch 2025-09-07T06:44:15.3725184Z CONTINUE_THROUGH_ERROR=True 2025-09-07T06:44:15.3731659Z BUILD_ENVIRONMENT=linux-noble-rocm-py3.12-mi300 2025-09-07T06:44:15.3731877Z HOSTNAME=linux.rocm.gpu.gfx942.1-xb8kr-runner-hql9s 2025-09-07T06:44:15.3732152Z GITHUB_PATH=/home/runner/_work/_temp/_runner_file_commands/add_path_91c6f083-d87f-4b90-8db9-5082ca97d3da 2025-09-07T06:44:15.3732365Z GITHUB_ACTION=__self 2025-09-07T06:44:15.3732481Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=0 2025-09-07T06:44:15.3732612Z GITHUB_RUN_NUMBER=9398 2025-09-07T06:44:15.3732707Z TEST_CONFIG=default 2025-09-07T06:44:15.3732810Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-09-07T06:44:15.3732939Z AWS_DEFAULT_REGION=us-east-1 2025-09-07T06:44:15.3733106Z GITHUB_TRIGGERING_ACTOR=pytorchmergebot 2025-09-07T06:44:15.3733239Z GITHUB_REF_TYPE=branch 2025-09-07T06:44:15.3733666Z *** 2025-09-07T06:44:15.3733764Z GITHUB_REPOSITORY_ID=65600975 2025-09-07T06:44:15.3733880Z GITHUB_ACTIONS=true 2025-09-07T06:44:15.3733996Z SHA1=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:44:15.3734147Z GITHUB_SHA=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:44:15.3734351Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/rocm-mi300.yml@refs/heads/main 2025-09-07T06:44:15.3736387Z UCC_HOME=/usr 2025-09-07T06:44:15.3736622Z VERBOSE_TEST_LOGS=False 2025-09-07T06:44:15.3736730Z GITHUB_REF=refs/heads/main 2025-09-07T06:44:15.3736828Z SHARD_NUMBER=6 2025-09-07T06:44:15.3736916Z GITHUB_REF_PROTECTED=true 2025-09-07T06:44:15.3737015Z HOME=/var/lib/jenkins 2025-09-07T06:44:15.3737131Z GITHUB_API_URL=https://api.github.com 2025-09-07T06:44:15.3737255Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-09-07T06:44:15.3737369Z LANG=C.UTF-8 2025-09-07T06:44:15.3737475Z UCX_COMMIT=cc312eaa4655c0cc5c2bcd796db938f90563bcf6 2025-09-07T06:44:15.3738948Z PYTORCH_TEST_WITH_ROCM=1 2025-09-07T06:44:15.3739052Z NUM_TEST_SHARDS=6 2025-09-07T06:44:15.3739143Z UCX_HOME=/usr 2025-09-07T06:44:15.3739323Z GITHUB_STATE=/home/runner/_work/_temp/_runner_file_commands/save_state_91c6f083-d87f-4b90-8db9-5082ca97d3da 2025-09-07T06:44:15.3740007Z JOB_NAME=linux-noble-rocm-py3.12-mi300 / test (default, 6, 6, linux.rocm.gpu.gfx942.1) 2025-09-07T06:44:15.3740189Z MAGMA_HOME=/opt/rocm/magma 2025-09-07T06:44:15.3740369Z GITHUB_ENV=/home/runner/_work/_temp/_runner_file_commands/set_env_91c6f083-d87f-4b90-8db9-5082ca97d3da 2025-09-07T06:44:15.3740603Z GITHUB_EVENT_PATH=/home/runner/_work/_temp/_github_workflow/event.json 2025-09-07T06:44:15.3740753Z GITHUB_EVENT_NAME=push 2025-09-07T06:44:15.3740903Z GITHUB_ACTIONS_RUNNER_EXTRA_USER_AGENT=actions-runner-controller/0.11.0 2025-09-07T06:44:15.3741055Z DASHBOARD_TAG= 2025-09-07T06:44:15.3742329Z GITHUB_RUN_ID=17524754565 2025-09-07T06:44:15.3742533Z GITHUB_STEP_SUMMARY=/home/runner/_work/_temp/_runner_file_commands/step_summary_91c6f083-d87f-4b90-8db9-5082ca97d3da 2025-09-07T06:44:15.3742746Z GITHUB_ACTOR=pytorchmergebot 2025-09-07T06:44:15.3742848Z PR_NUMBER= 2025-09-07T06:44:15.3742931Z GITHUB_RUN_ATTEMPT=1 2025-09-07T06:44:15.3743027Z ANACONDA_PYTHON_VERSION=3.12 2025-09-07T06:44:15.3743159Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-09-07T06:44:15.3743283Z TERM=vt100 2025-09-07T06:44:15.3743364Z INSTALLED_VISION=yes 2025-09-07T06:44:15.3743453Z BRANCH=main 2025-09-07T06:44:15.3743539Z OPENSSL_ROOT_DIR=/opt/openssl 2025-09-07T06:44:15.3744721Z TESTS_TO_INCLUDE= 2025-09-07T06:44:15.3744875Z GITHUB_ACTION_PATH=/home/runner/_work/pytorch/pytorch/./.github/actions/setup-rocm 2025-09-07T06:44:15.3745057Z GITHUB_SERVER_URL=https://github.com 2025-09-07T06:44:15.3745334Z PYTORCH_ROCM_ARCH=gfx90a;gfx942 2025-09-07T06:44:15.3745455Z UCC_COMMIT=0c0fc21559835044ab107199e334f7157d6a0d3d 2025-09-07T06:44:15.3745577Z REENABLED_ISSUES= 2025-09-07T06:44:15.3745659Z SHLVL=1 2025-09-07T06:44:15.3745738Z MAX_JOBS=158 2025-09-07T06:44:15.3745822Z GITHUB_ACTOR_ID=97764156 2025-09-07T06:44:15.3745953Z GITHUB_WORKFLOW_SHA=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:44:15.3746091Z GITHUB_REF_NAME=main 2025-09-07T06:44:15.3747408Z ROCM_PATH=/opt/rocm 2025-09-07T06:44:15.3747498Z GITHUB_JOB=test 2025-09-07T06:44:15.3747586Z NO_TEST_TIMEOUT=False 2025-09-07T06:44:15.3747691Z GITHUB_REPOSITORY=pytorch/pytorch 2025-09-07T06:44:15.3747798Z LC_ALL=C.UTF-8 2025-09-07T06:44:15.3747883Z GITHUB_RETENTION_DAYS=90 2025-09-07T06:44:15.3747983Z OPENSSL_DIR=/opt/openssl 2025-09-07T06:44:15.3748080Z GITHUB_ACTION_REPOSITORY= 2025-09-07T06:44:15.3748425Z PATH=/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.12/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-09-07T06:44:15.3748771Z GITHUB_BASE_REF= 2025-09-07T06:44:15.3748855Z CI=true 2025-09-07T06:44:15.3749956Z GITHUB_REPOSITORY_OWNER=pytorch 2025-09-07T06:44:15.3750062Z JOB_ID=49774353529 2025-09-07T06:44:15.3750145Z GITHUB_HEAD_REF= 2025-09-07T06:44:15.3750229Z GITHUB_ACTION_REF= 2025-09-07T06:44:15.3750314Z TEST_SHOWLOCALS=False 2025-09-07T06:44:15.3750409Z GITHUB_WORKFLOW=rocm-mi300 2025-09-07T06:44:15.3750514Z DEBIAN_FRONTEND=noninteractive 2025-09-07T06:44:15.3750713Z GITHUB_OUTPUT=/home/runner/_work/_temp/_runner_file_commands/set_output_91c6f083-d87f-4b90-8db9-5082ca97d3da 2025-09-07T06:44:15.3750907Z NO_TD=False 2025-09-07T06:44:15.3750990Z OLDPWD=/var/lib/jenkins 2025-09-07T06:44:15.3751080Z _=/usr/bin/env 2025-09-07T06:44:15.3752181Z ++ python -c 'import site; print(site.getsitepackages()[0])' 2025-09-07T06:44:15.3805968Z + TORCH_INSTALL_DIR=/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch 2025-09-07T06:44:15.3807317Z + TORCH_BIN_DIR=/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/bin 2025-09-07T06:44:15.3807614Z + TORCH_LIB_DIR=/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/lib 2025-09-07T06:44:15.3807846Z + TORCH_TEST_DIR=/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/test 2025-09-07T06:44:15.3808024Z + BUILD_DIR=build 2025-09-07T06:44:15.3808149Z + BUILD_RENAMED_DIR=build_renamed 2025-09-07T06:44:15.3808283Z + BUILD_BIN_DIR=build/bin 2025-09-07T06:44:15.3808405Z + SHARD_NUMBER=6 2025-09-07T06:44:15.3808507Z + NUM_TEST_SHARDS=6 2025-09-07T06:44:15.3808874Z + export TORCH_SERIALIZATION_DEBUG=1 2025-09-07T06:44:15.3813793Z + TORCH_SERIALIZATION_DEBUG=1 2025-09-07T06:44:15.3813916Z + export VALGRIND=ON 2025-09-07T06:44:15.3814015Z + VALGRIND=ON 2025-09-07T06:44:15.3814134Z + [[ linux-noble-rocm-py3.12-mi300 == *clang9* ]] 2025-09-07T06:44:15.3814278Z + [[ linux-noble-rocm-py3.12-mi300 == *xpu* ]] 2025-09-07T06:44:15.3814397Z + detect_cuda_arch 2025-09-07T06:44:15.3814525Z + [[ linux-noble-rocm-py3.12-mi300 == *cuda* ]] 2025-09-07T06:44:15.3814659Z + [[ linux-noble-rocm-py3.12-mi300 == *s390x* ]] 2025-09-07T06:44:15.3814775Z + [[ 0 == \1 ]] 2025-09-07T06:44:15.3814866Z + [[ True == \1 ]] 2025-09-07T06:44:15.3816957Z + [[ linux-noble-rocm-py3.12-mi300 != *bazel* ]] 2025-09-07T06:44:15.3817121Z ++ realpath build/custom_test_artifacts 2025-09-07T06:44:15.3817454Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/pytorch/build/custom_test_artifacts 2025-09-07T06:44:15.3817646Z + [[ -n '' ]] 2025-09-07T06:44:15.3817739Z + echo 'Environment variables' 2025-09-07T06:44:15.3817888Z Environment variables 2025-09-07T06:44:15.3817980Z + env 2025-09-07T06:44:15.3823048Z GITHUB_WORKSPACE=/home/runner/_work/pytorch/pytorch 2025-09-07T06:44:15.3823191Z CONTINUE_THROUGH_ERROR=True 2025-09-07T06:44:15.3823321Z BUILD_ENVIRONMENT=linux-noble-rocm-py3.12-mi300 2025-09-07T06:44:15.3825886Z HOSTNAME=linux.rocm.gpu.gfx942.1-xb8kr-runner-hql9s 2025-09-07T06:44:15.3826145Z GITHUB_PATH=/home/runner/_work/_temp/_runner_file_commands/add_path_91c6f083-d87f-4b90-8db9-5082ca97d3da 2025-09-07T06:44:15.3826617Z GITHUB_ACTION=__self 2025-09-07T06:44:15.3826724Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=0 2025-09-07T06:44:15.3826839Z GITHUB_RUN_NUMBER=9398 2025-09-07T06:44:15.3826934Z TEST_CONFIG=default 2025-09-07T06:44:15.3827028Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-09-07T06:44:15.3827149Z AWS_DEFAULT_REGION=us-east-1 2025-09-07T06:44:15.3827265Z GITHUB_TRIGGERING_ACTOR=pytorchmergebot 2025-09-07T06:44:15.3827422Z GITHUB_REF_TYPE=branch 2025-09-07T06:44:15.3827663Z *** 2025-09-07T06:44:15.3827767Z GITHUB_REPOSITORY_ID=65600975 2025-09-07T06:44:15.3827881Z GITHUB_ACTIONS=true 2025-09-07T06:44:15.3828018Z SHA1=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:44:15.3828203Z GITHUB_SHA=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:44:15.3828403Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/rocm-mi300.yml@refs/heads/main 2025-09-07T06:44:15.3828589Z UCC_HOME=/usr 2025-09-07T06:44:15.3828698Z TORCH_SERIALIZATION_DEBUG=1 2025-09-07T06:44:15.3828809Z VERBOSE_TEST_LOGS=False 2025-09-07T06:44:15.3828911Z GITHUB_REF=refs/heads/main 2025-09-07T06:44:15.3829012Z SHARD_NUMBER=6 2025-09-07T06:44:15.3830635Z GITHUB_REF_PROTECTED=true 2025-09-07T06:44:15.3830749Z HOME=/var/lib/jenkins 2025-09-07T06:44:15.3830891Z GITHUB_API_URL=https://api.github.com 2025-09-07T06:44:15.3831015Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-09-07T06:44:15.3831184Z LANG=C.UTF-8 2025-09-07T06:44:15.3831340Z UCX_COMMIT=cc312eaa4655c0cc5c2bcd796db938f90563bcf6 2025-09-07T06:44:15.3831470Z PYTORCH_TEST_WITH_ROCM=1 2025-09-07T06:44:15.3831573Z NUM_TEST_SHARDS=6 2025-09-07T06:44:15.3831660Z UCX_HOME=/usr 2025-09-07T06:44:15.3831841Z GITHUB_STATE=/home/runner/_work/_temp/_runner_file_commands/save_state_91c6f083-d87f-4b90-8db9-5082ca97d3da 2025-09-07T06:44:15.3832108Z JOB_NAME=linux-noble-rocm-py3.12-mi300 / test (default, 6, 6, linux.rocm.gpu.gfx942.1) 2025-09-07T06:44:15.3833477Z MAGMA_HOME=/opt/rocm/magma 2025-09-07T06:44:15.3833684Z GITHUB_ENV=/home/runner/_work/_temp/_runner_file_commands/set_env_91c6f083-d87f-4b90-8db9-5082ca97d3da 2025-09-07T06:44:15.3833957Z GITHUB_EVENT_PATH=/home/runner/_work/_temp/_github_workflow/event.json 2025-09-07T06:44:15.3834111Z GITHUB_EVENT_NAME=push 2025-09-07T06:44:15.3834254Z GITHUB_ACTIONS_RUNNER_EXTRA_USER_AGENT=actions-runner-controller/0.11.0 2025-09-07T06:44:15.3834408Z DASHBOARD_TAG= 2025-09-07T06:44:15.3834494Z GITHUB_RUN_ID=17524754565 2025-09-07T06:44:15.3834692Z GITHUB_STEP_SUMMARY=/home/runner/_work/_temp/_runner_file_commands/step_summary_91c6f083-d87f-4b90-8db9-5082ca97d3da 2025-09-07T06:44:15.3834985Z GITHUB_ACTOR=pytorchmergebot 2025-09-07T06:44:15.3835091Z PR_NUMBER= 2025-09-07T06:44:15.3835174Z GITHUB_RUN_ATTEMPT=1 2025-09-07T06:44:15.3836382Z VALGRIND=ON 2025-09-07T06:44:15.3836671Z ANACONDA_PYTHON_VERSION=3.12 2025-09-07T06:44:15.3836803Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-09-07T06:44:15.3836928Z TERM=vt100 2025-09-07T06:44:15.3837009Z INSTALLED_VISION=yes 2025-09-07T06:44:15.3837105Z BRANCH=main 2025-09-07T06:44:15.3837193Z OPENSSL_ROOT_DIR=/opt/openssl 2025-09-07T06:44:15.3837427Z TESTS_TO_INCLUDE= 2025-09-07T06:44:15.3837578Z GITHUB_ACTION_PATH=/home/runner/_work/pytorch/pytorch/./.github/actions/setup-rocm 2025-09-07T06:44:15.3837770Z GITHUB_SERVER_URL=https://github.com 2025-09-07T06:44:15.3837893Z PYTORCH_ROCM_ARCH=gfx90a;gfx942 2025-09-07T06:44:15.3839129Z UCC_COMMIT=0c0fc21559835044ab107199e334f7157d6a0d3d 2025-09-07T06:44:15.3839255Z REENABLED_ISSUES= 2025-09-07T06:44:15.3839339Z SHLVL=1 2025-09-07T06:44:15.3839415Z MAX_JOBS=158 2025-09-07T06:44:15.3839505Z GITHUB_ACTOR_ID=97764156 2025-09-07T06:44:15.3839633Z GITHUB_WORKFLOW_SHA=93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:44:15.3839769Z GITHUB_REF_NAME=main 2025-09-07T06:44:15.3839859Z ROCM_PATH=/opt/rocm 2025-09-07T06:44:15.3839944Z GITHUB_JOB=test 2025-09-07T06:44:15.3840031Z NO_TEST_TIMEOUT=False 2025-09-07T06:44:15.3840132Z GITHUB_REPOSITORY=pytorch/pytorch 2025-09-07T06:44:15.3841461Z LC_ALL=C.UTF-8 2025-09-07T06:44:15.3841556Z GITHUB_RETENTION_DAYS=90 2025-09-07T06:44:15.3841689Z OPENSSL_DIR=/opt/openssl 2025-09-07T06:44:15.3841803Z GITHUB_ACTION_REPOSITORY= 2025-09-07T06:44:15.3842149Z PATH=/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.12/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-09-07T06:44:15.3842487Z GITHUB_BASE_REF= 2025-09-07T06:44:15.3842570Z CI=true 2025-09-07T06:44:15.3842653Z GITHUB_REPOSITORY_OWNER=pytorch 2025-09-07T06:44:15.3842764Z JOB_ID=49774353529 2025-09-07T06:44:15.3842849Z GITHUB_HEAD_REF= 2025-09-07T06:44:15.3842933Z GITHUB_ACTION_REF= 2025-09-07T06:44:15.3844089Z TEST_SHOWLOCALS=False 2025-09-07T06:44:15.3844190Z GITHUB_WORKFLOW=rocm-mi300 2025-09-07T06:44:15.3844297Z DEBIAN_FRONTEND=noninteractive 2025-09-07T06:44:15.3844493Z GITHUB_OUTPUT=/home/runner/_work/_temp/_runner_file_commands/set_output_91c6f083-d87f-4b90-8db9-5082ca97d3da 2025-09-07T06:44:15.3844692Z NO_TD=False 2025-09-07T06:44:15.3844774Z OLDPWD=/var/lib/jenkins 2025-09-07T06:44:15.3844864Z _=/usr/bin/env 2025-09-07T06:44:15.3844950Z + echo 'Testing pytorch' 2025-09-07T06:44:15.3845045Z Testing pytorch 2025-09-07T06:44:15.3845133Z + export LANG=C.UTF-8 2025-09-07T06:44:15.3845222Z + LANG=C.UTF-8 2025-09-07T06:44:15.3846351Z + PR_NUMBER= 2025-09-07T06:44:15.3846445Z + [[ default == \d\e\f\a\u\l\t ]] 2025-09-07T06:44:15.3846654Z + export CUDA_VISIBLE_DEVICES=0 2025-09-07T06:44:15.3846759Z + CUDA_VISIBLE_DEVICES=0 2025-09-07T06:44:15.3846858Z + export HIP_VISIBLE_DEVICES=0 2025-09-07T06:44:15.3846967Z + HIP_VISIBLE_DEVICES=0 2025-09-07T06:44:15.3847069Z + [[ default == \d\i\s\t\r\i\b\u\t\e\d ]] 2025-09-07T06:44:15.3847183Z + [[ default == \s\l\o\w ]] 2025-09-07T06:44:15.3847312Z + [[ linux-noble-rocm-py3.12-mi300 == *slow-gradcheck* ]] 2025-09-07T06:44:15.3847464Z + [[ linux-noble-rocm-py3.12-mi300 == *cuda* ]] 2025-09-07T06:44:15.3847597Z + [[ linux-noble-rocm-py3.12-mi300 == *rocm* ]] 2025-09-07T06:44:15.3848909Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-09-07T06:44:15.3849045Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-09-07T06:44:15.3849160Z + [[ default == *crossref* ]] 2025-09-07T06:44:15.3849275Z + [[ linux-noble-rocm-py3.12-mi300 == *rocm* ]] 2025-09-07T06:44:15.3849393Z + export VALGRIND=OFF 2025-09-07T06:44:15.3849482Z + VALGRIND=OFF 2025-09-07T06:44:15.3849562Z + rocminfo 2025-09-07T06:44:15.3943767Z ROCk module version 6.12.12 is loaded 2025-09-07T06:44:15.4397153Z ===================== 2025-09-07T06:44:15.4397649Z HSA System Attributes 2025-09-07T06:44:15.4406926Z ===================== 2025-09-07T06:44:15.4407168Z Runtime Version: 1.15 2025-09-07T06:44:15.4407280Z Runtime Ext Version: 1.7 2025-09-07T06:44:15.4407402Z System Timestamp Freq.: 1000.000000MHz 2025-09-07T06:44:15.4407584Z Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) 2025-09-07T06:44:15.4407789Z Machine Model: LARGE 2025-09-07T06:44:15.4407966Z System Endianness: LITTLE 2025-09-07T06:44:15.4408183Z Mwaitx: DISABLED 2025-09-07T06:44:15.4408344Z XNACK enabled: NO 2025-09-07T06:44:15.4408454Z DMAbuf Support: YES 2025-09-07T06:44:15.4408570Z VMM Support: YES 2025-09-07T06:44:15.4408638Z 2025-09-07T06:44:15.4408677Z ========== 2025-09-07T06:44:15.4408778Z HSA Agents 2025-09-07T06:44:15.4412104Z ========== 2025-09-07T06:44:15.4412233Z ******* 2025-09-07T06:44:15.4412338Z Agent 1 2025-09-07T06:44:15.4412451Z ******* 2025-09-07T06:44:15.4412575Z Name: AMD EPYC 9575F 64-Core Processor 2025-09-07T06:44:15.4412728Z Uuid: CPU-XX 2025-09-07T06:44:15.4412884Z Marketing Name: AMD EPYC 9575F 64-Core Processor 2025-09-07T06:44:15.4413035Z Vendor Name: CPU 2025-09-07T06:44:15.4413297Z Feature: None specified 2025-09-07T06:44:15.4413446Z Profile: FULL_PROFILE 2025-09-07T06:44:15.4413593Z Float Round Mode: NEAR 2025-09-07T06:44:15.4413746Z Max Queue Number: 0(0x0) 2025-09-07T06:44:15.4413913Z Queue Min Size: 0(0x0) 2025-09-07T06:44:15.4414055Z Queue Max Size: 0(0x0) 2025-09-07T06:44:15.4414204Z Queue Type: MULTI 2025-09-07T06:44:15.4414345Z Node: 0 2025-09-07T06:44:15.4414488Z Device Type: CPU 2025-09-07T06:44:15.4414623Z Cache Info: 2025-09-07T06:44:15.4414978Z L1: 65536(0x10000) KB 2025-09-07T06:44:15.4415118Z Chip ID: 0(0x0) 2025-09-07T06:44:15.4415317Z ASIC Revision: 0(0x0) 2025-09-07T06:44:15.4415490Z Cacheline Size: 64(0x40) 2025-09-07T06:44:15.4415640Z Max Clock Freq. (MHz): 0 2025-09-07T06:44:15.4415801Z BDFID: 0 2025-09-07T06:44:15.4415944Z Internal Node ID: 0 2025-09-07T06:44:15.4416094Z Compute Unit: 80 2025-09-07T06:44:15.4416239Z SIMDs per CU: 0 2025-09-07T06:44:15.4416385Z Shader Engines: 0 2025-09-07T06:44:15.4416643Z Shader Arrs. per Eng.: 0 2025-09-07T06:44:15.4416819Z WatchPts on Addr. Ranges:1 2025-09-07T06:44:15.4418497Z Memory Properties: 2025-09-07T06:44:15.4418613Z Features: None 2025-09-07T06:44:15.4418719Z Pool Info: 2025-09-07T06:44:15.4418818Z Pool 1 2025-09-07T06:44:15.4418945Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:44:15.4419094Z Size: 660245920(0x275a8da0) KB 2025-09-07T06:44:15.4419257Z Allocatable: TRUE 2025-09-07T06:44:15.4419407Z Alloc Granule: 4KB 2025-09-07T06:44:15.4419633Z Alloc Recommended Granule:4KB 2025-09-07T06:44:15.4419791Z Alloc Alignment: 4KB 2025-09-07T06:44:15.4419944Z Accessible by all: TRUE 2025-09-07T06:44:15.4421293Z Pool 2 2025-09-07T06:44:15.4421420Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:44:15.4421569Z Size: 660245920(0x275a8da0) KB 2025-09-07T06:44:15.4421707Z Allocatable: TRUE 2025-09-07T06:44:15.4421853Z Alloc Granule: 4KB 2025-09-07T06:44:15.4422006Z Alloc Recommended Granule:4KB 2025-09-07T06:44:15.4422159Z Alloc Alignment: 4KB 2025-09-07T06:44:15.4422308Z Accessible by all: TRUE 2025-09-07T06:44:15.4422448Z Pool 3 2025-09-07T06:44:15.4422583Z Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 2025-09-07T06:44:15.4422723Z Size: 660245920(0x275a8da0) KB 2025-09-07T06:44:15.4423886Z Allocatable: TRUE 2025-09-07T06:44:15.4424055Z Alloc Granule: 4KB 2025-09-07T06:44:15.4424260Z Alloc Recommended Granule:4KB 2025-09-07T06:44:15.4424412Z Alloc Alignment: 4KB 2025-09-07T06:44:15.4424561Z Accessible by all: TRUE 2025-09-07T06:44:15.4424691Z Pool 4 2025-09-07T06:44:15.4424810Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:44:15.4424951Z Size: 660245920(0x275a8da0) KB 2025-09-07T06:44:15.4425096Z Allocatable: TRUE 2025-09-07T06:44:15.4425243Z Alloc Granule: 4KB 2025-09-07T06:44:15.4425409Z Alloc Recommended Granule:4KB 2025-09-07T06:44:15.4426686Z Alloc Alignment: 4KB 2025-09-07T06:44:15.4426865Z Accessible by all: TRUE 2025-09-07T06:44:15.4427005Z ISA Info: 2025-09-07T06:44:15.4427103Z ******* 2025-09-07T06:44:15.4427197Z Agent 2 2025-09-07T06:44:15.4427289Z ******* 2025-09-07T06:44:15.4427403Z Name: AMD EPYC 9575F 64-Core Processor 2025-09-07T06:44:15.4427541Z Uuid: CPU-XX 2025-09-07T06:44:15.4427688Z Marketing Name: AMD EPYC 9575F 64-Core Processor 2025-09-07T06:44:15.4427845Z Vendor Name: CPU 2025-09-07T06:44:15.4429001Z Feature: None specified 2025-09-07T06:44:15.4429148Z Profile: FULL_PROFILE 2025-09-07T06:44:15.4429303Z Float Round Mode: NEAR 2025-09-07T06:44:15.4429453Z Max Queue Number: 0(0x0) 2025-09-07T06:44:15.4429596Z Queue Min Size: 0(0x0) 2025-09-07T06:44:15.4429741Z Queue Max Size: 0(0x0) 2025-09-07T06:44:15.4429881Z Queue Type: MULTI 2025-09-07T06:44:15.4430015Z Node: 1 2025-09-07T06:44:15.4430150Z Device Type: CPU 2025-09-07T06:44:15.4430278Z Cache Info: 2025-09-07T06:44:15.4430388Z L1: 65536(0x10000) KB 2025-09-07T06:44:15.4431542Z Chip ID: 0(0x0) 2025-09-07T06:44:15.4431689Z ASIC Revision: 0(0x0) 2025-09-07T06:44:15.4431834Z Cacheline Size: 64(0x40) 2025-09-07T06:44:15.4431998Z Max Clock Freq. (MHz): 0 2025-09-07T06:44:15.4432136Z BDFID: 0 2025-09-07T06:44:15.4432278Z Internal Node ID: 1 2025-09-07T06:44:15.4432422Z Compute Unit: 80 2025-09-07T06:44:15.4432563Z SIMDs per CU: 0 2025-09-07T06:44:15.4432725Z Shader Engines: 0 2025-09-07T06:44:15.4432874Z Shader Arrs. per Eng.: 0 2025-09-07T06:44:15.4433026Z WatchPts on Addr. Ranges:1 2025-09-07T06:44:15.4433165Z Memory Properties: 2025-09-07T06:44:15.4434287Z Features: None 2025-09-07T06:44:15.4434392Z Pool Info: 2025-09-07T06:44:15.4434489Z Pool 1 2025-09-07T06:44:15.4434614Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:44:15.4434759Z Size: 656328592(0x271ec790) KB 2025-09-07T06:44:15.4434901Z Allocatable: TRUE 2025-09-07T06:44:15.4435102Z Alloc Granule: 4KB 2025-09-07T06:44:15.4435256Z Alloc Recommended Granule:4KB 2025-09-07T06:44:15.4435411Z Alloc Alignment: 4KB 2025-09-07T06:44:15.4435563Z Accessible by all: TRUE 2025-09-07T06:44:15.4435693Z Pool 2 2025-09-07T06:44:15.4435816Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:44:15.4437012Z Size: 656328592(0x271ec790) KB 2025-09-07T06:44:15.4437153Z Allocatable: TRUE 2025-09-07T06:44:15.4437442Z Alloc Granule: 4KB 2025-09-07T06:44:15.4437595Z Alloc Recommended Granule:4KB 2025-09-07T06:44:15.4437748Z Alloc Alignment: 4KB 2025-09-07T06:44:15.4437902Z Accessible by all: TRUE 2025-09-07T06:44:15.4438032Z Pool 3 2025-09-07T06:44:15.4438154Z Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 2025-09-07T06:44:15.4438294Z Size: 656328592(0x271ec790) KB 2025-09-07T06:44:15.4438433Z Allocatable: TRUE 2025-09-07T06:44:15.4438580Z Alloc Granule: 4KB 2025-09-07T06:44:15.4439714Z Alloc Recommended Granule:4KB 2025-09-07T06:44:15.4439867Z Alloc Alignment: 4KB 2025-09-07T06:44:15.4440018Z Accessible by all: TRUE 2025-09-07T06:44:15.4440147Z Pool 4 2025-09-07T06:44:15.4440266Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:44:15.4440412Z Size: 656328592(0x271ec790) KB 2025-09-07T06:44:15.4440549Z Allocatable: TRUE 2025-09-07T06:44:15.4440696Z Alloc Granule: 4KB 2025-09-07T06:44:15.4440851Z Alloc Recommended Granule:4KB 2025-09-07T06:44:15.4441003Z Alloc Alignment: 4KB 2025-09-07T06:44:15.4442065Z Accessible by all: TRUE 2025-09-07T06:44:15.4442318Z ISA Info: 2025-09-07T06:44:15.4442415Z ******* 2025-09-07T06:44:15.4442509Z Agent 3 2025-09-07T06:44:15.4442599Z ******* 2025-09-07T06:44:15.4442705Z Name: gfx942 2025-09-07T06:44:15.4442842Z Uuid: GPU-d52b70587e52af6d 2025-09-07T06:44:15.4442993Z Marketing Name: AMD Instinct Mi325X VF 2025-09-07T06:44:15.4443142Z Vendor Name: AMD 2025-09-07T06:44:15.4443286Z Feature: KERNEL_DISPATCH 2025-09-07T06:44:15.4443429Z Profile: BASE_PROFILE 2025-09-07T06:44:15.4444499Z Float Round Mode: NEAR 2025-09-07T06:44:15.4444669Z Max Queue Number: 128(0x80) 2025-09-07T06:44:15.4444817Z Queue Min Size: 64(0x40) 2025-09-07T06:44:15.4444959Z Queue Max Size: 131072(0x20000) 2025-09-07T06:44:15.4445099Z Queue Type: MULTI 2025-09-07T06:44:15.4445232Z Node: 2 2025-09-07T06:44:15.4445367Z Device Type: GPU 2025-09-07T06:44:15.4445493Z Cache Info: 2025-09-07T06:44:15.4445679Z L1: 32(0x20) KB 2025-09-07T06:44:15.4445807Z L2: 4096(0x1000) KB 2025-09-07T06:44:15.4445931Z L3: 262144(0x40000) KB 2025-09-07T06:44:15.4447193Z Chip ID: 29881(0x74b9) 2025-09-07T06:44:15.4447337Z ASIC Revision: 1(0x1) 2025-09-07T06:44:15.4447481Z Cacheline Size: 128(0x80) 2025-09-07T06:44:15.4447652Z Max Clock Freq. (MHz): 2100 2025-09-07T06:44:15.4447791Z BDFID: 37632 2025-09-07T06:44:15.4447930Z Internal Node ID: 2 2025-09-07T06:44:15.4448073Z Compute Unit: 304 2025-09-07T06:44:15.4448212Z SIMDs per CU: 4 2025-09-07T06:44:15.4448359Z Shader Engines: 32 2025-09-07T06:44:15.4448507Z Shader Arrs. per Eng.: 1 2025-09-07T06:44:15.4448657Z WatchPts on Addr. Ranges:4 2025-09-07T06:44:15.4449771Z Coherent Host Access: FALSE 2025-09-07T06:44:15.4449906Z Memory Properties: 2025-09-07T06:44:15.4450017Z Features: KERNEL_DISPATCH 2025-09-07T06:44:15.4450154Z Fast F16 Operation: TRUE 2025-09-07T06:44:15.4450308Z Wavefront Size: 64(0x40) 2025-09-07T06:44:15.4450456Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:44:15.4450593Z Workgroup Max Size per Dimension: 2025-09-07T06:44:15.4450712Z x 1024(0x400) 2025-09-07T06:44:15.4450834Z y 1024(0x400) 2025-09-07T06:44:15.4450958Z z 1024(0x400) 2025-09-07T06:44:15.4451091Z Max Waves Per CU: 32(0x20) 2025-09-07T06:44:15.4452171Z Max Work-item Per CU: 2048(0x800) 2025-09-07T06:44:15.4452320Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:44:15.4452450Z Grid Max Size per Dimension: 2025-09-07T06:44:15.4452561Z x 4294967295(0xffffffff) 2025-09-07T06:44:15.4452738Z y 4294967295(0xffffffff) 2025-09-07T06:44:15.4452861Z z 4294967295(0xffffffff) 2025-09-07T06:44:15.4453001Z Max fbarriers/Workgrp: 32 2025-09-07T06:44:15.4453194Z Packet Processor uCode:: 177 2025-09-07T06:44:15.4453349Z SDMA engine uCode:: 24 2025-09-07T06:44:15.4453502Z IOMMU Support:: None 2025-09-07T06:44:15.4453631Z Pool Info: 2025-09-07T06:44:15.4454685Z Pool 1 2025-09-07T06:44:15.4454811Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-09-07T06:44:15.4454958Z Size: 268107776(0xffb0000) KB 2025-09-07T06:44:15.4455100Z Allocatable: TRUE 2025-09-07T06:44:15.4455248Z Alloc Granule: 4KB 2025-09-07T06:44:15.4455409Z Alloc Recommended Granule:2048KB 2025-09-07T06:44:15.4455564Z Alloc Alignment: 4KB 2025-09-07T06:44:15.4455716Z Accessible by all: FALSE 2025-09-07T06:44:15.4455849Z Pool 2 2025-09-07T06:44:15.4455972Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-09-07T06:44:15.4456163Z Size: 268107776(0xffb0000) KB 2025-09-07T06:44:15.4457405Z Allocatable: TRUE 2025-09-07T06:44:15.4457555Z Alloc Granule: 4KB 2025-09-07T06:44:15.4457707Z Alloc Recommended Granule:2048KB 2025-09-07T06:44:15.4457860Z Alloc Alignment: 4KB 2025-09-07T06:44:15.4458010Z Accessible by all: FALSE 2025-09-07T06:44:15.4458149Z Pool 3 2025-09-07T06:44:15.4458268Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-09-07T06:44:15.4458408Z Size: 268107776(0xffb0000) KB 2025-09-07T06:44:15.4458551Z Allocatable: TRUE 2025-09-07T06:44:15.4458699Z Alloc Granule: 4KB 2025-09-07T06:44:15.4458854Z Alloc Recommended Granule:2048KB 2025-09-07T06:44:15.4460173Z Alloc Alignment: 4KB 2025-09-07T06:44:15.4460324Z Accessible by all: FALSE 2025-09-07T06:44:15.4460457Z Pool 4 2025-09-07T06:44:15.4460572Z Segment: GROUP 2025-09-07T06:44:15.4460708Z Size: 64(0x40) KB 2025-09-07T06:44:15.4460851Z Allocatable: FALSE 2025-09-07T06:44:15.4460999Z Alloc Granule: 0KB 2025-09-07T06:44:15.4461151Z Alloc Recommended Granule:0KB 2025-09-07T06:44:15.4461304Z Alloc Alignment: 0KB 2025-09-07T06:44:15.4461454Z Accessible by all: FALSE 2025-09-07T06:44:15.4461591Z ISA Info: 2025-09-07T06:44:15.4462630Z ISA 1 2025-09-07T06:44:15.4462757Z Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- 2025-09-07T06:44:15.4462916Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-09-07T06:44:15.4463071Z Profiles: HSA_PROFILE_BASE 2025-09-07T06:44:15.4463223Z Default Rounding Mode: NEAR 2025-09-07T06:44:15.4463378Z Default Rounding Mode: NEAR 2025-09-07T06:44:15.4463573Z Fast f16: TRUE 2025-09-07T06:44:15.4463720Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:44:15.4463861Z Workgroup Max Size per Dimension: 2025-09-07T06:44:15.4463987Z x 1024(0x400) 2025-09-07T06:44:15.4464113Z y 1024(0x400) 2025-09-07T06:44:15.4465233Z z 1024(0x400) 2025-09-07T06:44:15.4465369Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:44:15.4465503Z Grid Max Size per Dimension: 2025-09-07T06:44:15.4465619Z x 4294967295(0xffffffff) 2025-09-07T06:44:15.4465744Z y 4294967295(0xffffffff) 2025-09-07T06:44:15.4465866Z z 4294967295(0xffffffff) 2025-09-07T06:44:15.4466009Z FBarrier Max Size: 32 2025-09-07T06:44:15.4466138Z ISA 2 2025-09-07T06:44:15.4466274Z Name: amdgcn-amd-amdhsa--gfx9-4-generic:sramecc+:xnack- 2025-09-07T06:44:15.4466443Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-09-07T06:44:15.4466688Z Profiles: HSA_PROFILE_BASE 2025-09-07T06:44:15.4467920Z Default Rounding Mode: NEAR 2025-09-07T06:44:15.4468077Z Default Rounding Mode: NEAR 2025-09-07T06:44:15.4468222Z Fast f16: TRUE 2025-09-07T06:44:15.4468366Z Workgroup Max Size: 1024(0x400) 2025-09-07T06:44:15.4468503Z Workgroup Max Size per Dimension: 2025-09-07T06:44:15.4468623Z x 1024(0x400) 2025-09-07T06:44:15.4468753Z y 1024(0x400) 2025-09-07T06:44:15.4468874Z z 1024(0x400) 2025-09-07T06:44:15.4469009Z Grid Max Size: 4294967295(0xffffffff) 2025-09-07T06:44:15.4469140Z Grid Max Size per Dimension: 2025-09-07T06:44:15.4469255Z x 4294967295(0xffffffff) 2025-09-07T06:44:15.4470345Z y 4294967295(0xffffffff) 2025-09-07T06:44:15.4470470Z z 4294967295(0xffffffff) 2025-09-07T06:44:15.4470608Z FBarrier Max Size: 32 2025-09-07T06:44:15.4470756Z *** Done *** 2025-09-07T06:44:15.4470850Z + rocminfo 2025-09-07T06:44:15.4470941Z + grep -E 'Name:.*\sgfx|Marketing' 2025-09-07T06:44:15.5017833Z Marketing Name: AMD EPYC 9575F 64-Core Processor 2025-09-07T06:44:15.5018118Z Marketing Name: AMD EPYC 9575F 64-Core Processor 2025-09-07T06:44:15.5019031Z Name: gfx942 2025-09-07T06:44:15.5019555Z Marketing Name: AMD Instinct Mi325X VF 2025-09-07T06:44:15.5068331Z + MAYBE_ROCM=rocm/ 2025-09-07T06:44:15.5068714Z + [[ linux-noble-rocm-py3.12-mi300 == *xpu* ]] 2025-09-07T06:44:15.5069117Z + [[ linux-noble-rocm-py3.12-mi300 != *-bazel-* ]] 2025-09-07T06:44:15.5069279Z + pip_install ninja==1.10.2 2025-09-07T06:44:15.5069433Z + pip_install_pkg='python3 -m pip install --progress-bar off' 2025-09-07T06:44:15.5069609Z + python3 -m pip install --progress-bar off ninja==1.10.2 2025-09-07T06:44:15.7392902Z Collecting ninja==1.10.2 2025-09-07T06:44:15.8113926Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (5.0 kB) 2025-09-07T06:44:15.8414981Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB) 2025-09-07T06:44:15.9752478Z Installing collected packages: ninja 2025-09-07T06:44:15.9753183Z Attempting uninstall: ninja 2025-09-07T06:44:15.9762833Z Found existing installation: ninja 1.11.1.3 2025-09-07T06:44:15.9772468Z Uninstalling ninja-1.11.1.3: 2025-09-07T06:44:15.9831252Z Successfully uninstalled ninja-1.11.1.3 2025-09-07T06:44:15.9932872Z Successfully installed ninja-1.10.2 2025-09-07T06:44:16.0245104Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.12/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-09-07T06:44:16.0245951Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.12/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-09-07T06:44:16.0246377Z + [[ linux-noble-rocm-py3.12-mi300 == *aarch64* ]] 2025-09-07T06:44:16.0246763Z + [[ linux-noble-rocm-py3.12-mi300 == *asan* ]] 2025-09-07T06:44:16.0246911Z + [[ linux-noble-rocm-py3.12-mi300 == *-debug* ]] 2025-09-07T06:44:16.0247063Z + [[ linux-noble-rocm-py3.12-mi300 != *-bazel-* ]] 2025-09-07T06:44:16.0247274Z + echo 'We are not in debug mode: linux-noble-rocm-py3.12-mi300. Expect the assertion to pass' 2025-09-07T06:44:16.0247523Z We are not in debug mode: linux-noble-rocm-py3.12-mi300. Expect the assertion to pass 2025-09-07T06:44:16.0248033Z + cd test 2025-09-07T06:44:16.0248300Z + python -c 'import torch; torch._C._crash_if_debug_asserts_fail(424242)' 2025-09-07T06:44:18.0537773Z + [[ default == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]] 2025-09-07T06:44:18.0538122Z + [[ default == \n\o\g\p\u\_\A\V\X\5\1\2 ]] 2025-09-07T06:44:18.0538347Z + [[ default == \l\e\g\a\c\y\_\n\v\i\d\i\a\_\d\r\i\v\e\r ]] 2025-09-07T06:44:18.0538571Z + DYNAMO_BENCHMARK_FLAGS=() 2025-09-07T06:44:18.0538740Z + [[ default == *pr_time_benchmarks* ]] 2025-09-07T06:44:18.0538933Z + [[ default == *dynamo_eager* ]] 2025-09-07T06:44:18.0546667Z + [[ default == *aot_eager* ]] 2025-09-07T06:44:18.0546864Z + [[ default == *aot_inductor* ]] 2025-09-07T06:44:18.0547078Z + [[ default == *max_autotune_inductor* ]] 2025-09-07T06:44:18.0547303Z + [[ default == *inductor* ]] 2025-09-07T06:44:18.0547498Z + [[ default == *dynamic* ]] 2025-09-07T06:44:18.0547697Z + [[ default == *cpu* ]] 2025-09-07T06:44:18.0547904Z + DYNAMO_BENCHMARK_FLAGS+=(--device cuda) 2025-09-07T06:44:18.0548137Z + [[ linux-noble-rocm-py3.12-mi300 == *libtorch* ]] 2025-09-07T06:44:18.0548398Z + [[ linux-noble-rocm-py3.12-mi300 == *-bazel-* ]] 2025-09-07T06:44:18.0550691Z + cd test 2025-09-07T06:44:18.0551748Z + python -c 'import torch; print(torch.__config__.show())' 2025-09-07T06:44:18.8877817Z PyTorch built with: 2025-09-07T06:44:18.8878231Z - GCC 11.5 2025-09-07T06:44:18.8878486Z - C++ Version: 201703 2025-09-07T06:44:18.8887940Z - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-09-07T06:44:18.8888736Z - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-09-07T06:44:18.8889178Z - OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-09-07T06:44:18.8889313Z - LAPACK is enabled (usually provided by MKL) 2025-09-07T06:44:18.8889438Z - NNPACK is enabled 2025-09-07T06:44:18.8889547Z - CPU capability usage: AVX512 2025-09-07T06:44:18.8889664Z - HIP Runtime 6.4.43484 2025-09-07T06:44:18.8889771Z - MIOpen 3.4.0 2025-09-07T06:44:18.8889878Z - Magma 2.7.2 2025-09-07T06:44:18.8891693Z - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=93fb23d6fae7c4e82c4239a1033e522088742634, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_FBGEMM_GENAI -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.9.0, USE_CUDA=OFF, USE_CUDNN=OFF, USE_CUSPARSELT=OFF, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=ON, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF, 2025-09-07T06:44:18.8894736Z 2025-09-07T06:44:19.0955781Z + cd test 2025-09-07T06:44:19.0957561Z + python -c 'import torch; print(torch.__config__.parallel_info())' 2025-09-07T06:44:19.7587609Z ATen/Parallel: 2025-09-07T06:44:19.7587899Z at::get_num_threads() : 160 2025-09-07T06:44:19.7588123Z at::get_num_interop_threads() : 160 2025-09-07T06:44:19.7588339Z OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-09-07T06:44:19.7588588Z omp_get_max_threads() : 160 2025-09-07T06:44:19.7588932Z Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-09-07T06:44:19.7589270Z mkl_get_max_threads() : 160 2025-09-07T06:44:19.7595313Z Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-09-07T06:44:19.7595602Z std::thread::hardware_concurrency() : 160 2025-09-07T06:44:19.7596142Z Environment variables: 2025-09-07T06:44:19.7596313Z OMP_NUM_THREADS : [not set] 2025-09-07T06:44:19.7596478Z MKL_NUM_THREADS : [not set] 2025-09-07T06:44:19.7596778Z ATen parallel backend: OpenMP 2025-09-07T06:44:19.7596890Z 2025-09-07T06:44:19.9823641Z + [[ default == *numpy_2* ]] 2025-09-07T06:44:19.9824121Z + [[ linux-noble-rocm-py3.12-mi300 == *aarch64* ]] 2025-09-07T06:44:19.9824518Z + [[ default == *backward* ]] 2025-09-07T06:44:19.9824814Z + [[ default == *xla* ]] 2025-09-07T06:44:19.9825085Z + [[ default == *vllm* ]] 2025-09-07T06:44:19.9831181Z + [[ default == *executorch* ]] 2025-09-07T06:44:19.9831415Z + [[ default == \j\i\t\_\l\e\g\a\c\y ]] 2025-09-07T06:44:19.9831687Z + [[ linux-noble-rocm-py3.12-mi300 == *libtorch* ]] 2025-09-07T06:44:19.9831951Z + [[ default == distributed ]] 2025-09-07T06:44:19.9832168Z + [[ default == *operator_benchmark* ]] 2025-09-07T06:44:19.9832407Z + [[ default == *inductor_distributed* ]] 2025-09-07T06:44:19.9832640Z + [[ default == *inductor-halide* ]] 2025-09-07T06:44:19.9832883Z + [[ default == *inductor-triton-cpu* ]] 2025-09-07T06:44:19.9833138Z + [[ default == *inductor-micro-benchmark* ]] 2025-09-07T06:44:19.9833376Z + [[ default == *huggingface* ]] 2025-09-07T06:44:19.9836339Z + [[ default == *timm* ]] 2025-09-07T06:44:19.9836704Z + [[ default == cachebench ]] 2025-09-07T06:44:19.9836913Z + [[ default == verify_cachebench ]] 2025-09-07T06:44:19.9837136Z + [[ default == *torchbench* ]] 2025-09-07T06:44:19.9837478Z + [[ default == *inductor_cpp_wrapper* ]] 2025-09-07T06:44:19.9837704Z + [[ default == *inductor* ]] 2025-09-07T06:44:19.9837906Z + [[ default == *einops* ]] 2025-09-07T06:44:19.9838109Z + [[ default == *dynamo_wrapped* ]] 2025-09-07T06:44:19.9838351Z + [[ linux-noble-rocm-py3.12-mi300 == *rocm* ]] 2025-09-07T06:44:19.9838594Z + [[ -n '' ]] 2025-09-07T06:44:19.9838754Z + [[ 6 == 1 ]] 2025-09-07T06:44:19.9840664Z + [[ 6 == 2 ]] 2025-09-07T06:44:19.9840789Z + [[ 6 -gt 2 ]] 2025-09-07T06:44:19.9840920Z + install_torchvision 2025-09-07T06:44:19.9841045Z + local orig_preload 2025-09-07T06:44:19.9841179Z + local commit 2025-09-07T06:44:19.9841310Z ++ get_pinned_commit vision 2025-09-07T06:44:19.9841459Z ++ cat .github/ci_commit_pins/vision.txt 2025-09-07T06:44:19.9841648Z + commit=966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-09-07T06:44:19.9841815Z + orig_preload= 2025-09-07T06:44:19.9841934Z + '[' -n '' ']' 2025-09-07T06:44:19.9842065Z + [[ linux-noble-rocm-py3.12-mi300 == *cuda* ]] 2025-09-07T06:44:19.9843979Z + pip_build_and_install git+https://github.com/pytorch/vision.git@966da7e46f65d6d49df3e31214470a4fe5cc8e66 dist/vision 2025-09-07T06:44:19.9844627Z + local build_target=git+https://github.com/pytorch/vision.git@966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-09-07T06:44:19.9844917Z + local wheel_dir=dist/vision 2025-09-07T06:44:19.9845054Z + local found_whl=0 2025-09-07T06:44:19.9845186Z + for file in "${wheel_dir}"/*.whl 2025-09-07T06:44:19.9845344Z + [[ -f dist/vision/*.whl ]] 2025-09-07T06:44:19.9845484Z + '[' 0 == 0 ']' 2025-09-07T06:44:19.9845857Z + python3 -m pip wheel --no-build-isolation --no-deps --no-use-pep517 -w dist/vision git+https://github.com/pytorch/vision.git@966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-09-07T06:44:20.1222971Z Collecting git+https://github.com/pytorch/vision.git@966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-09-07T06:44:20.1225258Z Cloning https://github.com/pytorch/vision.git (to revision 966da7e46f65d6d49df3e31214470a4fe5cc8e66) to /tmp/pip-req-build-t2dya2l3 2025-09-07T06:44:20.1237025Z Running command git clone --filter=blob:none --quiet https://github.com/pytorch/vision.git /tmp/pip-req-build-t2dya2l3 2025-09-07T06:44:24.9364734Z Running command git rev-parse -q --verify 'sha^966da7e46f65d6d49df3e31214470a4fe5cc8e66' 2025-09-07T06:44:24.9379852Z Running command git fetch -q https://github.com/pytorch/vision.git 966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-09-07T06:44:25.5576095Z Running command git checkout -q 966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-09-07T06:44:25.8899008Z Resolved https://github.com/pytorch/vision.git to commit 966da7e46f65d6d49df3e31214470a4fe5cc8e66 2025-09-07T06:44:27.3103567Z Preparing metadata (setup.py) ... [?25l- \ | done 2025-09-07T06:44:27.3119668Z [?25hBuilding wheels for collected packages: torchvision 2025-09-07T06:44:27.3869145Z  DEPRECATION: Building 'torchvision' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'torchvision'. Discussion can be found at https://github.com/pypa/pip/issues/6334 2025-09-07T06:44:56.2949691Z  Building wheel for torchvision (setup.py) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - done 2025-09-07T06:44:56.2965546Z [?25h Created wheel for torchvision: filename=torchvision-0.22.0a0+966da7e-cp312-cp312-linux_x86_64.whl size=1581390 sha256=a7cbcb4ad10aa654539a7abb123b9bdcc6cc7edcee7d8fc26da3bf9baff00d8f 2025-09-07T06:44:56.2966311Z Stored in directory: /var/lib/jenkins/.cache/pip/wheels/10/ba/61/eb5228b3631dc6bb4f478b3aa59575551a5473e4596e4c001a 2025-09-07T06:44:56.2991085Z Successfully built torchvision 2025-09-07T06:44:56.3610125Z + for file in "${wheel_dir}"/*.whl 2025-09-07T06:44:56.3610498Z + pip_install_whl dist/vision/torchvision-0.22.0a0+966da7e-cp312-cp312-linux_x86_64.whl 2025-09-07T06:44:56.3610968Z + args=('dist/vision/torchvision-0.22.0a0+966da7e-cp312-cp312-linux_x86_64.whl') 2025-09-07T06:44:56.3611262Z + local args 2025-09-07T06:44:56.3611522Z + [[ dist/vision/torchvision-0.22.0a0+966da7e-cp312-cp312-linux_x86_64.whl == *\ * ]] 2025-09-07T06:44:56.3618544Z + for path in "${args[@]}" 2025-09-07T06:44:56.3618873Z + echo 'Installing dist/vision/torchvision-0.22.0a0+966da7e-cp312-cp312-linux_x86_64.whl' 2025-09-07T06:44:56.3619298Z Installing dist/vision/torchvision-0.22.0a0+966da7e-cp312-cp312-linux_x86_64.whl 2025-09-07T06:44:56.3619779Z + python3 -mpip install --no-index --no-deps dist/vision/torchvision-0.22.0a0+966da7e-cp312-cp312-linux_x86_64.whl 2025-09-07T06:44:56.5173326Z Processing ./dist/vision/torchvision-0.22.0a0+966da7e-cp312-cp312-linux_x86_64.whl 2025-09-07T06:44:56.5216777Z Installing collected packages: torchvision 2025-09-07T06:44:56.8334989Z Successfully installed torchvision-0.22.0a0+966da7e 2025-09-07T06:44:56.8653857Z + '[' -n '' ']' 2025-09-07T06:44:56.8654924Z + test_python_shard 6 2025-09-07T06:44:56.8661113Z + [[ -z 6 ]] 2025-09-07T06:44:56.8661428Z + python test/run_test.py --exclude-jit-executor --exclude-distributed-tests --shard 6 6 --verbose --upload-artifacts-while-running 2025-09-07T06:44:58.2163978Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T06:44:58.2165330Z import pkg_resources 2025-09-07T06:44:58.5499362Z Excluding test_cuda_nvml_based_avail on ROCm 2025-09-07T06:44:58.5635181Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/pytorch/test/.pytorch-disabled-tests.json 2025-09-07T06:44:58.7394029Z Ignoring disabled issues: [''] 2025-09-07T06:44:58.7458428Z Found test times from artifacts 2025-09-07T06:44:58.7711388Z Found test times from artifacts 2025-09-07T06:44:58.7718177Z Running all tests 2025-09-07T06:44:58.7891469Z Running parallel tests on 1 processes 2025-09-07T06:44:58.7892459Z Name: tests to run (est. time: 154.11min) 2025-09-07T06:44:58.7892843Z Serial tests (83): 2025-09-07T06:44:58.7893171Z inductor/test_extension_backend 1/1 2025-09-07T06:44:58.7893524Z inductor/test_inplace_padding 1/1 2025-09-07T06:44:58.7893887Z inductor/test_max_autotune 1/1 2025-09-07T06:44:58.7894827Z dynamo/test_flat_apply 1/1 2025-09-07T06:44:58.7903791Z dynamo/test_frame_init 1/1 2025-09-07T06:44:58.7904016Z dynamo/test_functions 1/1 2025-09-07T06:44:58.7904191Z dynamo/test_generator 1/1 2025-09-07T06:44:58.7904358Z dynamo/test_global 1/1 2025-09-07T06:44:58.7904537Z dynamo/test_graph_region_tracker 1/1 2025-09-07T06:44:58.7904743Z dynamo/test_guard_manager 1/1 2025-09-07T06:44:58.7904926Z dynamo/test_higher_order_ops 1/1 2025-09-07T06:44:58.7905101Z dynamo/test_input_attr_tracking 1/1 2025-09-07T06:44:58.7905303Z dynamo/test_install_free_tensors 1/1 2025-09-07T06:44:58.7905507Z dynamo/test_nested_graph_breaks 1/1 2025-09-07T06:44:58.7905701Z dynamo/test_nops 1/1 2025-09-07T06:44:58.7905864Z dynamo/test_optimizers 1/1 2025-09-07T06:44:58.7906040Z dynamo/test_pgo 1/1 2025-09-07T06:44:58.7906205Z dynamo/test_pre_dispatch 1/1 2025-09-07T06:44:58.7906386Z dynamo/test_precompile_context 1/1 2025-09-07T06:44:58.7908994Z dynamo/test_profiler 1/1 2025-09-07T06:44:58.7909186Z dynamo/test_python_autograd 1/1 2025-09-07T06:44:58.7909365Z dynamo/test_python_dispatcher 1/1 2025-09-07T06:44:58.7909545Z dynamo/test_recompile_ux 1/1 2025-09-07T06:44:58.7909725Z dynamo/test_reconstruct 1/1 2025-09-07T06:44:58.7909896Z dynamo/test_reorder_logs 1/1 2025-09-07T06:44:58.7910057Z dynamo/test_repros 1/1 2025-09-07T06:44:58.7910213Z export/test_draft_export 1/1 2025-09-07T06:44:58.7910376Z export/test_export_strict 1/1 2025-09-07T06:44:58.7910556Z export/test_schema 1/1 2025-09-07T06:44:58.7910700Z export/test_serdes 1/1 2025-09-07T06:44:58.7912106Z functorch/test_ops 3/4 2025-09-07T06:44:58.7912235Z inductor/test_auto_functionalize 1/1 2025-09-07T06:44:58.7912378Z inductor/test_autoheuristic 1/1 2025-09-07T06:44:58.7912516Z inductor/test_benchmark_fusion 1/1 2025-09-07T06:44:58.7912645Z inductor/test_compile 1/1 2025-09-07T06:44:58.7912776Z inductor/test_compile_subprocess 1/2 2025-09-07T06:44:58.7912917Z inductor/test_config 1/1 2025-09-07T06:44:58.7913038Z inductor/test_control_flow 1/2 2025-09-07T06:44:58.7913168Z inductor/test_cpu_repro 4/5 2025-09-07T06:44:58.7913296Z inductor/test_flex_decoding 1/2 2025-09-07T06:44:58.7913424Z inductor/test_fx_fusion 1/1 2025-09-07T06:44:58.7914674Z inductor/test_gpu_cpp_wrapper 1/1 2025-09-07T06:44:58.7914818Z inductor/test_layout_optim 1/1 2025-09-07T06:44:58.7914977Z inductor/test_torchinductor_codegen_dynamic_shapes 3/4 2025-09-07T06:44:58.7915325Z inductor/test_torchinductor_opinfo 1/9 2025-09-07T06:44:58.7915468Z inductor/test_torchinductor_opinfo 7/9 2025-09-07T06:44:58.7915606Z inductor/test_xpu_basic 1/1 2025-09-07T06:44:58.7915724Z nn/test_pooling 1/1 2025-09-07T06:44:58.7915846Z profiler/test_profiler_tree 1/1 2025-09-07T06:44:58.7915975Z profiler/test_python_tracer 1/1 2025-09-07T06:44:58.7916111Z profiler/test_record_function 1/1 2025-09-07T06:44:58.7916252Z profiler/test_torch_tidy 1/1 2025-09-07T06:44:58.7917773Z test_accelerator 1/1 2025-09-07T06:44:58.7917896Z test_autocast 1/1 2025-09-07T06:44:58.7918009Z test_autograd_fallback 1/1 2025-09-07T06:44:58.7918125Z test_autoload 1/1 2025-09-07T06:44:58.7918237Z test_binary_ufuncs 1/1 2025-09-07T06:44:58.7918358Z test_ci_sanity_check_fail 1/1 2025-09-07T06:44:58.7918478Z test_decomp 2/12 2025-09-07T06:44:58.7918583Z test_decomp 8/12 2025-09-07T06:44:58.7918691Z test_function_schema 1/1 2025-09-07T06:44:58.7918829Z test_functional_autograd_benchmark 1/1 2025-09-07T06:44:58.7918970Z test_functional_optim 1/1 2025-09-07T06:44:58.7920204Z test_functionalization 1/1 2025-09-07T06:44:58.7920343Z test_functionalization_of_rng_ops 1/1 2025-09-07T06:44:58.7920473Z test_futures 1/1 2025-09-07T06:44:58.7920576Z test_fx 1/3 2025-09-07T06:44:58.7920673Z test_meta 1/2 2025-09-07T06:44:58.7920775Z test_numa_binding 1/1 2025-09-07T06:44:58.7920961Z test_numba_integration 1/1 2025-09-07T06:44:58.7921082Z test_numpy_interop 1/1 2025-09-07T06:44:58.7921194Z test_openmp 1/1 2025-09-07T06:44:58.7921296Z test_openreg 1/1 2025-09-07T06:44:58.7921399Z test_ops 1/4 2025-09-07T06:44:58.7922468Z test_ops_gradients 2/2 2025-09-07T06:44:58.7922570Z test_quantization 4/5 2025-09-07T06:44:58.7922665Z test_stateless 1/1 2025-09-07T06:44:58.7922758Z test_sympy_utils 1/1 2025-09-07T06:44:58.7922852Z test_tensorboard 1/1 2025-09-07T06:44:58.7922945Z test_tensorexpr 1/1 2025-09-07T06:44:58.7923044Z test_tensorexpr_pybind 1/1 2025-09-07T06:44:58.7923142Z test_testing 1/1 2025-09-07T06:44:58.7923233Z test_transformers 1/1 2025-09-07T06:44:58.7923330Z Parallel tests (0): 2025-09-07T06:44:58.7923430Z Name: excluded (est. time: 0.0min) 2025-09-07T06:44:58.7924902Z Serial tests (0): 2025-09-07T06:44:58.7924997Z Parallel tests (0): 2025-09-07T06:44:58.7925141Z Running inductor/test_extension_backend 1/1 ... [2025-09-07 06:44:58.789537] 2025-09-07T06:44:58.7925313Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:44:58.7925706Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_extension_backend.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:44:58.789757] 2025-09-07T06:45:19.2397880Z 2025-09-07T06:45:19.2398890Z inductor/test_extension_backend 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_extension_backend_1.1_b70a4ddd4d2a159a_.log 2025-09-07T06:45:19.2399851Z Running 1 items in this shard: test/inductor/test_extension_backend.py::ExtensionBackendTests::test_open_device_registration 2025-09-07T06:45:19.2400230Z 2025-09-07T06:45:19.2400427Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-09-07T06:45:19.2400785Z Uploading artifacts took 0.00 seconds 2025-09-07T06:45:19.2401143Z Running inductor/test_inplace_padding 1/1 ... [2025-09-07 06:45:19.239807] 2025-09-07T06:45:19.2402404Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:45:19.2402950Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_inplace_padding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:45:19.240042] 2025-09-07T06:45:40.4454774Z 2025-09-07T06:45:40.4460445Z inductor/test_inplace_padding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_inplace_padding_1.1_8fe6c2261503c73b_.log 2025-09-07T06:45:40.4462349Z Running 9 items in this shard: test/inductor/test_inplace_padding.py::InplacePaddingTest::test_linear_and_cel, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_linear_and_cel_max_autotune, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_mutating_padding_input, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_mutating_padding_output, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_pad_non_zero, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_pad_non_zero_cpp_wrapper, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_pad_too_large, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_skip_pad_due_to_fusion, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_skip_pad_input 2025-09-07T06:45:40.4463966Z 2025-09-07T06:45:40.4464104Z Running inductor/test_max_autotune 1/1 ... [2025-09-07 06:45:40.445286] 2025-09-07T06:45:40.4464327Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:45:40.4498869Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_max_autotune.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:45:40.445474] 2025-09-07T06:54:29.0826817Z 2025-09-07T06:54:29.0835700Z inductor/test_max_autotune 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_max_autotune_1.1_7881b5d74dfa7e07_.log 2025-09-07T06:54:29.0862645Z Running 181 items in this shard: test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_conv1x1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_device_guard, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_gemm_choice_validation_op_addmm_max_autotune_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_gemm_choice_validation_op_addmm_max_autotune_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_gemm_choice_validation_op_baddbmm_max_autotune_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_gemm_choice_validation_op_baddbmm_max_autotune_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_gemm_choice_validation_op_bmm_max_autotune_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_gemm_choice_validation_op_bmm_max_autotune_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_gemm_choice_validation_op_mm_max_autotune_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_gemm_choice_validation_op_mm_max_autotune_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_gemm_choice_validation_op_mm_plus_mm_max_autotune_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_autotune_gemm_choice_validation_op_mm_plus_mm_max_autotune_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_baddmm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_cat_addmm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_cat_max_autotune_extern, test/inductor/test_max_autotune.py::TestMaxAutotune::test_cat_max_autotune_triton, test/inductor/test_max_autotune.py::TestMaxAutotune::test_conv1x1_with_free_symbols, test/inductor/test_max_autotune.py::TestMaxAutotune::test_conv3d, test/inductor/test_max_autotune.py::TestMaxAutotune::test_conv_backend, test/inductor/test_max_autotune.py::TestMaxAutotune::test_conv_cat, test/inductor/test_max_autotune.py::TestMaxAutotune::test_empty_conv_input, test/inductor/test_max_autotune.py::TestMaxAutotune::test_empty_conv_input_with_1x1_kernel, test/inductor/test_max_autotune.py::TestMaxAutotune::test_honor_sm_carveout_with_triton_tma_carveout0_op_mm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_honor_sm_carveout_with_triton_tma_carveout0_op_scaled_mm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_honor_sm_carveout_with_triton_tma_carveout_0_op_mm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_honor_sm_carveout_with_triton_tma_carveout_0_op_scaled_mm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_honor_sm_carveout_with_triton_tma_carveout_27_op_mm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_honor_sm_carveout_with_triton_tma_carveout_27_op_scaled_mm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_inf_timing_multi_template_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_inf_timing_multi_template_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_jit_fusion_matches_aot_fusion, test/inductor/test_max_autotune.py::TestMaxAutotune::test_linear_and_cel, test/inductor/test_max_autotune.py::TestMaxAutotune::test_matmul_dropout_device_cpu, test/inductor/test_max_autotune.py::TestMaxAutotune::test_matmul_dropout_device_cuda, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_persistent_tma_a_transposed_False_b_transposed_False_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_persistent_tma_a_transposed_False_b_transposed_False_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_persistent_tma_a_transposed_False_b_transposed_True_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_persistent_tma_a_transposed_False_b_transposed_True_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_persistent_tma_a_transposed_True_b_transposed_False_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_persistent_tma_a_transposed_True_b_transposed_False_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_persistent_tma_a_transposed_True_b_transposed_True_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_persistent_tma_a_transposed_True_b_transposed_True_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_persistent_tma_illegal_alignment_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_persistent_tma_illegal_alignment_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_tma_dynamic_outer_dim, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_zero_size_input_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_addmm_zero_size_input_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_addmm_bfloat16_sizes0, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_addmm_bfloat16_sizes1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_addmm_bfloat16_sizes2, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_addmm_float16_sizes0, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_addmm_float16_sizes1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_addmm_float16_sizes2, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_addmm_float32_sizes0, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_addmm_float32_sizes1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_addmm_float32_sizes2, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_mm_bfloat16_sizes0, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_mm_bfloat16_sizes1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_mm_bfloat16_sizes2, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_mm_float16_sizes0, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_mm_float16_sizes1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_mm_float16_sizes2, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_mm_float32_sizes0, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_mm_float32_sizes1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_mm_float32_sizes2, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_non_contiguous_second_matrix_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_non_contiguous_second_matrix_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_contiguous_transform_with_epilogue, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes0, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_bfloat16_sizes2, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_float16_sizes0, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_float16_sizes1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_False_float16_sizes2, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_True_bfloat16_sizes0, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_True_bfloat16_sizes1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_True_bfloat16_sizes2, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_True_float16_sizes0, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_True_float16_sizes1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_True_float16_sizes2, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_input, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_dynamic_input_bwd, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_envvars_num_decompose_k_splits_0_decompose_k_threshold_16, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_envvars_num_decompose_k_splits_0_decompose_k_threshold_8, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_envvars_num_decompose_k_splits_20_decompose_k_threshold_16, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_envvars_num_decompose_k_splits_20_decompose_k_threshold_8, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_envvars_num_decompose_k_splits_5_decompose_k_threshold_16, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_envvars_num_decompose_k_splits_5_decompose_k_threshold_8, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_decompose_k_output_stride, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_exhaustive, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_mm_plus_mm_zero_size_input_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_mm_plus_mm_zero_size_input_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_prune_choices, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_a_transposed_False_b_transposed_False_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_a_transposed_False_b_transposed_False_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_a_transposed_False_b_transposed_True_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_a_transposed_False_b_transposed_True_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_a_transposed_True_b_transposed_False_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_a_transposed_True_b_transposed_False_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_a_transposed_True_b_transposed_True_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_a_transposed_True_b_transposed_True_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_illegal_alignment_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_illegal_alignment_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_strided_a_transposed_False_b_transposed_False_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_strided_a_transposed_False_b_transposed_False_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_strided_a_transposed_False_b_transposed_True_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_strided_a_transposed_False_b_transposed_True_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_strided_a_transposed_True_b_transposed_False_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_strided_a_transposed_True_b_transposed_False_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_strided_a_transposed_True_b_transposed_True_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_persistent_tma_strided_a_transposed_True_b_transposed_True_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_tma_dynamic_outer_dim, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_zero_size_input_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotune::test_max_autotune_regular_mm_zero_size_input_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotune::test_mm_k_1, test/inductor/test_max_autotune.py::TestMaxAutotune::test_mutation_rename, test/inductor/test_max_autotune.py::TestMaxAutotune::test_no_valid_choices, test/inductor/test_max_autotune.py::TestMaxAutotune::test_non_contiguous_input_addmm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_non_contiguous_input_bmm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_non_contiguous_input_mm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_non_contiguous_input_mm_plus_mm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_triton_template_generated_code_cache_key, test/inductor/test_max_autotune.py::TestMaxAutotune::test_triton_template_generated_code_cache_strategy, test/inductor/test_max_autotune.py::TestMaxAutotune::test_triton_template_generated_code_caching, test/inductor/test_max_autotune.py::TestMaxAutotune::test_triton_template_generated_code_caching_bmm, test/inductor/test_max_autotune.py::TestMaxAutotune::test_triton_template_generated_code_caching_mm_plus_mm, test/inductor/test_max_autotune.py::TestMaxAutotunePrecompile::test_filled_cache_precompile, test/inductor/test_max_autotune.py::TestMaxAutotunePrecompile::test_precompilation_threads, test/inductor/test_max_autotune.py::TestMaxAutotunePrecompile::test_precompilations, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_benchmark_choice_fail_in_subproc, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_benchmark_choice_in_subproc, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_max_autotune_addmm_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_max_autotune_addmm_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_max_autotune_mm_plus_mm_autotune_in_subproc_False_autotune_multi_device_False, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_max_autotune_mm_plus_mm_autotune_in_subproc_False_autotune_multi_device_True, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_max_autotune_mm_plus_mm_autotune_in_subproc_True_autotune_multi_device_False, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_max_autotune_mm_plus_mm_autotune_in_subproc_True_autotune_multi_device_True, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_max_autotune_regular_mm_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_max_autotune_regular_mm_dynamic_True, test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_triton_template_with_epilogues_and_dynamic_shape, test/inductor/test_max_autotune.py::TestMaxAutotuneRemoteCache::test_max_autotune_remote_caching_dynamic_False, test/inductor/test_max_autotune.py::TestMaxAutotuneRemoteCache::test_max_autotune_remote_caching_dynamic_True, test/inductor/test_max_autotune.py::TestTuningProcess::test_tuning_subproc_crash, test/inductor/test_max_autotune.py::TestTuningProcess::test_tuning_subproc_exception, test/inductor/test_max_autotune.py::TestTuningProcess::test_tuning_subproc_killed, test/inductor/test_max_autotune.py::TestTuningProcess::test_tuning_subproc_timeout, test/inductor/test_max_autotune.py::TestTuningProcess::test_visible_devices, test/inductor/test_max_autotune.py::TestTuningProcessPool::test_add_feedback_saver, test/inductor/test_max_autotune.py::TestTuningProcessPool::test_clear_feedback_savers, test/inductor/test_max_autotune.py::TestTuningProcessPool::test_feedback_saver_integration, test/inductor/test_max_autotune.py::TestTuningProcessPool::test_tuning_pool_crash, test/inductor/test_max_autotune.py::TestTuningProcessPool::test_tuning_pool_multiple_devices, test/inductor/test_max_autotune.py::TestTuningProcessPool::test_tuning_pool_timeout, test/inductor/test_max_autotune.py::TestPrologueFusion::test_broadcast_x_K_63, test/inductor/test_max_autotune.py::TestPrologueFusion::test_broadcast_x_K_64, test/inductor/test_max_autotune.py::TestPrologueFusion::test_broadcast_y, test/inductor/test_max_autotune.py::TestPrologueFusion::test_downcast, test/inductor/test_max_autotune.py::TestPrologueFusion::test_gather_fusion, test/inductor/test_max_autotune.py::TestPrologueFusion::test_low_precision, test/inductor/test_max_autotune.py::TestPrologueFusion::test_mismatched_prologue_group, test/inductor/test_max_autotune.py::TestPrologueFusion::test_multiple_fusions_sizes0, test/inductor/test_max_autotune.py::TestPrologueFusion::test_multiple_fusions_sizes1, test/inductor/test_max_autotune.py::TestPrologueFusion::test_multiple_fusions_sizes2, test/inductor/test_max_autotune.py::TestPrologueFusion::test_multiple_inputs_sizes0, test/inductor/test_max_autotune.py::TestPrologueFusion::test_multiple_inputs_sizes1, test/inductor/test_max_autotune.py::TestPrologueFusion::test_multiple_inputs_sizes2, test/inductor/test_max_autotune.py::TestPrologueFusion::test_pending_fusion_pro_and_epi, test/inductor/test_max_autotune.py::TestPrologueFusion::test_pending_fusions_multiple, test/inductor/test_max_autotune.py::TestPrologueFusion::test_preserves_zero_analysis, test/inductor/test_max_autotune.py::TestPrologueFusion::test_prologue_masked_load_sizes0, test/inductor/test_max_autotune.py::TestPrologueFusion::test_prologue_masked_load_sizes1, test/inductor/test_max_autotune.py::TestPrologueFusion::test_prologue_masked_load_sizes2, test/inductor/test_max_autotune.py::TestPrologueFusion::test_prologue_multiple_nodes_sizes0, test/inductor/test_max_autotune.py::TestPrologueFusion::test_prologue_multiple_nodes_sizes1, test/inductor/test_max_autotune.py::TestPrologueFusion::test_prologue_multiple_nodes_sizes2, test/inductor/test_max_autotune.py::TestPrologueFusion::test_prologue_read_into_both_inputs_benchmark_fusion_False, test/inductor/test_max_autotune.py::TestPrologueFusion::test_prologue_read_into_both_inputs_benchmark_fusion_True, test/inductor/test_max_autotune.py::TestPrologueFusion::test_storage_offset_prologue, test/inductor/test_max_autotune.py::TestPrologueFusion::test_upcast_sizes0, test/inductor/test_max_autotune.py::TestPrologueFusion::test_upcast_sizes1, test/inductor/test_max_autotune.py::TestPrologueFusion::test_upcast_sizes2 2025-09-07T06:54:29.0888465Z 2025-09-07T06:54:29.0888558Z Running dynamo/test_flat_apply 1/1 ... [2025-09-07 06:54:29.082505] 2025-09-07T06:54:29.0888747Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:54:29.0889144Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_flat_apply.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:54:29.082710] 2025-09-07T06:54:31.4518777Z 2025-09-07T06:54:31.4520586Z dynamo/test_flat_apply 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_flat_apply_1.1_da7b9db2886833d9_.log 2025-09-07T06:54:31.4522745Z Running 4 items in this shard: test/dynamo/test_flat_apply.py::FlatApplyTests::test_non_tensor_output, test/dynamo/test_flat_apply.py::FlatApplyTests::test_nonstrict_trace_captured_tensor_post_aot_graph, test/dynamo/test_flat_apply.py::FlatApplyTests::test_nonstrict_trace_dynamo_graph, test/dynamo/test_flat_apply.py::FlatApplyTests::test_simple 2025-09-07T06:54:31.4523267Z 2025-09-07T06:54:31.4524791Z Running dynamo/test_frame_init 1/1 ... [2025-09-07 06:54:31.452023] 2025-09-07T06:54:31.4525423Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:54:31.4537917Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_frame_init.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:54:31.452295] 2025-09-07T06:54:33.5215322Z 2025-09-07T06:54:33.5216701Z dynamo/test_frame_init 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_frame_init_1.1_e4d302e769d9e235_.log 2025-09-07T06:54:33.5217470Z Running 1 items in this shard: test/dynamo/test_frame_init.py::FrameInitTests::test_frame_init 2025-09-07T06:54:33.5217722Z 2025-09-07T06:54:33.5220342Z Running dynamo/test_functions 1/1 ... [2025-09-07 06:54:33.521461] 2025-09-07T06:54:33.5220646Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:54:33.5221304Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_functions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:54:33.521747] 2025-09-07T06:55:15.8698720Z 2025-09-07T06:55:15.8706996Z dynamo/test_functions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_functions_1.1_95fa2f72c0db781a_.log 2025-09-07T06:55:15.8757877Z Running 469 items in this shard: test/dynamo/test_functions.py::FunctionTests::test_T, test/dynamo/test_functions.py::FunctionTests::test_add, test/dynamo/test_functions.py::FunctionTests::test_add_, test/dynamo/test_functions.py::FunctionTests::test_addcdiv, test/dynamo/test_functions.py::FunctionTests::test_addcdiv_, test/dynamo/test_functions.py::FunctionTests::test_addcmul_, test/dynamo/test_functions.py::FunctionTests::test_are_functorch_transforms_active, test/dynamo/test_functions.py::FunctionTests::test_attrgetter, test/dynamo/test_functions.py::FunctionTests::test_broadcast_foreach_pow, test/dynamo/test_functions.py::FunctionTests::test_build_list_unpack, test/dynamo/test_functions.py::FunctionTests::test_call_dict1, test/dynamo/test_functions.py::FunctionTests::test_call_dict2, test/dynamo/test_functions.py::FunctionTests::test_call_dict3, test/dynamo/test_functions.py::FunctionTests::test_call_dict4, test/dynamo/test_functions.py::FunctionTests::test_call_dict5, test/dynamo/test_functions.py::FunctionTests::test_callable_builtin, test/dynamo/test_functions.py::FunctionTests::test_callable_class, test/dynamo/test_functions.py::FunctionTests::test_callable_lambda, test/dynamo/test_functions.py::FunctionTests::test_callable_list, test/dynamo/test_functions.py::FunctionTests::test_callable_torch, test/dynamo/test_functions.py::FunctionTests::test_chunks1, test/dynamo/test_functions.py::FunctionTests::test_class_dict, test/dynamo/test_functions.py::FunctionTests::test_cls_eq, test/dynamo/test_functions.py::FunctionTests::test_cls_hasattr, test/dynamo/test_functions.py::FunctionTests::test_cls_is, test/dynamo/test_functions.py::FunctionTests::test_compare_constant_and_tensor, test/dynamo/test_functions.py::FunctionTests::test_complex_closure, test/dynamo/test_functions.py::FunctionTests::test_const_tuple_add1, test/dynamo/test_functions.py::FunctionTests::test_const_tuple_add2, test/dynamo/test_functions.py::FunctionTests::test_constant1, test/dynamo/test_functions.py::FunctionTests::test_constant2, test/dynamo/test_functions.py::FunctionTests::test_constant3, test/dynamo/test_functions.py::FunctionTests::test_constant4, test/dynamo/test_functions.py::FunctionTests::test_constant_set, test/dynamo/test_functions.py::FunctionTests::test_context_wrapping_nested_functions_no_closure, test/dynamo/test_functions.py::FunctionTests::test_cublas_allow_tf32, test/dynamo/test_functions.py::FunctionTests::test_custom_dict_kwargs, test/dynamo/test_functions.py::FunctionTests::test_default_dict_closure, test/dynamo/test_functions.py::FunctionTests::test_default_dict_constr, test/dynamo/test_functions.py::FunctionTests::test_default_dict_dict, test/dynamo/test_functions.py::FunctionTests::test_default_dict_lambda, test/dynamo/test_functions.py::FunctionTests::test_default_dict_list, test/dynamo/test_functions.py::FunctionTests::test_default_dict_set, test/dynamo/test_functions.py::FunctionTests::test_default_dict_tuple, test/dynamo/test_functions.py::FunctionTests::test_defaultdict_setdefault1, test/dynamo/test_functions.py::FunctionTests::test_defaultdict_setdefault2, test/dynamo/test_functions.py::FunctionTests::test_defaultdict_setdefault3, test/dynamo/test_functions.py::FunctionTests::test_del, test/dynamo/test_functions.py::FunctionTests::test_deque, test/dynamo/test_functions.py::FunctionTests::test_device, test/dynamo/test_functions.py::FunctionTests::test_device_constant, test/dynamo/test_functions.py::FunctionTests::test_dict_copy, test/dynamo/test_functions.py::FunctionTests::test_dict_fromkeys, test/dynamo/test_functions.py::FunctionTests::test_dict_hasattr, test/dynamo/test_functions.py::FunctionTests::test_dict_id_guard, test/dynamo/test_functions.py::FunctionTests::test_dict_items_sorted, test/dynamo/test_functions.py::FunctionTests::test_dict_key_set1, test/dynamo/test_functions.py::FunctionTests::test_dict_key_set2, test/dynamo/test_functions.py::FunctionTests::test_dict_key_set3, test/dynamo/test_functions.py::FunctionTests::test_dict_keys, test/dynamo/test_functions.py::FunctionTests::test_dict_kwargs, test/dynamo/test_functions.py::FunctionTests::test_dict_mutable_map, test/dynamo/test_functions.py::FunctionTests::test_dict_ops, test/dynamo/test_functions.py::FunctionTests::test_dict_param_keys, test/dynamo/test_functions.py::FunctionTests::test_dict_setdefault1, test/dynamo/test_functions.py::FunctionTests::test_dict_setdefault2, test/dynamo/test_functions.py::FunctionTests::test_dict_setdefault3, test/dynamo/test_functions.py::FunctionTests::test_dict_sorted, test/dynamo/test_functions.py::FunctionTests::test_dict_tuple_lazy_guard, test/dynamo/test_functions.py::FunctionTests::test_dict_update, test/dynamo/test_functions.py::FunctionTests::test_dict_update_kwargs, test/dynamo/test_functions.py::FunctionTests::test_dict_values, test/dynamo/test_functions.py::FunctionTests::test_distributed_is_available, test/dynamo/test_functions.py::FunctionTests::test_distributed_is_initialized, test/dynamo/test_functions.py::FunctionTests::test_dtype, test/dynamo/test_functions.py::FunctionTests::test_dtype_compare, test/dynamo/test_functions.py::FunctionTests::test_elipsis, test/dynamo/test_functions.py::FunctionTests::test_enumerate, test/dynamo/test_functions.py::FunctionTests::test_enumerate_custom, test/dynamo/test_functions.py::FunctionTests::test_enumerate_reconstruct, test/dynamo/test_functions.py::FunctionTests::test_filter, test/dynamo/test_functions.py::FunctionTests::test_filter_fallback, test/dynamo/test_functions.py::FunctionTests::test_filter_graph_break_reconstruct, test/dynamo/test_functions.py::FunctionTests::test_filter_infinite_iterator, test/dynamo/test_functions.py::FunctionTests::test_filter_reconstruct, test/dynamo/test_functions.py::FunctionTests::test_filter_with_graph_break, test/dynamo/test_functions.py::FunctionTests::test_finfo, test/dynamo/test_functions.py::FunctionTests::test_flat_param_same_storage_size, test/dynamo/test_functions.py::FunctionTests::test_float, test/dynamo/test_functions.py::FunctionTests::test_fn_with_self_set, test/dynamo/test_functions.py::FunctionTests::test_foreach_lerp_, test/dynamo/test_functions.py::FunctionTests::test_fstrings1, test/dynamo/test_functions.py::FunctionTests::test_fstrings2, test/dynamo/test_functions.py::FunctionTests::test_fstrings3, test/dynamo/test_functions.py::FunctionTests::test_fstrings4, test/dynamo/test_functions.py::FunctionTests::test_fstrings5, test/dynamo/test_functions.py::FunctionTests::test_fstrings6, test/dynamo/test_functions.py::FunctionTests::test_funcdef_closure, test/dynamo/test_functions.py::FunctionTests::test_functools_cache_guard, test/dynamo/test_functions.py::FunctionTests::test_functools_partial, test/dynamo/test_functions.py::FunctionTests::test_functools_partial_binding, test/dynamo/test_functions.py::FunctionTests::test_generic_namedtuple_hasattr, test/dynamo/test_functions.py::FunctionTests::test_generic_namedtuple_subclass, test/dynamo/test_functions.py::FunctionTests::test_generic_namedtuple_user_methods, test/dynamo/test_functions.py::FunctionTests::test_get_autocast_gpu_dtype, test/dynamo/test_functions.py::FunctionTests::test_get_calculate_correct_fan, test/dynamo/test_functions.py::FunctionTests::test_get_default_dtype, test/dynamo/test_functions.py::FunctionTests::test_get_device_properties_tensor_device, test/dynamo/test_functions.py::FunctionTests::test_get_privateuse1_name, test/dynamo/test_functions.py::FunctionTests::test_getattr, test/dynamo/test_functions.py::FunctionTests::test_getattr_metaclass, test/dynamo/test_functions.py::FunctionTests::test_globalfn, test/dynamo/test_functions.py::FunctionTests::test_globalmodule, test/dynamo/test_functions.py::FunctionTests::test_globalvar, test/dynamo/test_functions.py::FunctionTests::test_import1, test/dynamo/test_functions.py::FunctionTests::test_in_not_in, test/dynamo/test_functions.py::FunctionTests::test_index, test/dynamo/test_functions.py::FunctionTests::test_indexed_range, test/dynamo/test_functions.py::FunctionTests::test_indirect1, test/dynamo/test_functions.py::FunctionTests::test_indirect2, test/dynamo/test_functions.py::FunctionTests::test_indirect3, test/dynamo/test_functions.py::FunctionTests::test_inline_jit__unwrap_optional, test/dynamo/test_functions.py::FunctionTests::test_inline_jit_annotations, test/dynamo/test_functions.py::FunctionTests::test_inline_lru_cache_fn_with_default_args, test/dynamo/test_functions.py::FunctionTests::test_inline_script_if_tracing_fn_with_default_args, test/dynamo/test_functions.py::FunctionTests::test_inline_softmax, test/dynamo/test_functions.py::FunctionTests::test_inline_with_default, test/dynamo/test_functions.py::FunctionTests::test_inner_function, test/dynamo/test_functions.py::FunctionTests::test_is, test/dynamo/test_functions.py::FunctionTests::test_is_any_autocast_enabled, test/dynamo/test_functions.py::FunctionTests::test_is_checkpoint_valid, test/dynamo/test_functions.py::FunctionTests::test_is_complex, test/dynamo/test_functions.py::FunctionTests::test_is_contiguous_frame_counts, test/dynamo/test_functions.py::FunctionTests::test_is_contiguous_memory_format, test/dynamo/test_functions.py::FunctionTests::test_is_floating_point, test/dynamo/test_functions.py::FunctionTests::test_is_fx_tracing, test/dynamo/test_functions.py::FunctionTests::test_is_in_onnx_export, test/dynamo/test_functions.py::FunctionTests::test_is_inference_mode_global_recompilation, test/dynamo/test_functions.py::FunctionTests::test_is_inference_recompilation, test/dynamo/test_functions.py::FunctionTests::test_is_integer, test/dynamo/test_functions.py::FunctionTests::test_is_not, test/dynamo/test_functions.py::FunctionTests::test_is_not_null, test/dynamo/test_functions.py::FunctionTests::test_is_quantized, test/dynamo/test_functions.py::FunctionTests::test_is_sparse, test/dynamo/test_functions.py::FunctionTests::test_isinstance, test/dynamo/test_functions.py::FunctionTests::test_islice_chain, test/dynamo/test_functions.py::FunctionTests::test_itemgetter, test/dynamo/test_functions.py::FunctionTests::test_itertools_chain, test/dynamo/test_functions.py::FunctionTests::test_itertools_chain_from_iterable, test/dynamo/test_functions.py::FunctionTests::test_itertools_combinations, test/dynamo/test_functions.py::FunctionTests::test_itertools_compress, test/dynamo/test_functions.py::FunctionTests::test_itertools_compress_tensors, test/dynamo/test_functions.py::FunctionTests::test_itertools_filterfalse_basic, test/dynamo/test_functions.py::FunctionTests::test_itertools_pairwise, test/dynamo/test_functions.py::FunctionTests::test_itertools_permutations_args, test/dynamo/test_functions.py::FunctionTests::test_itertools_permutations_basic, test/dynamo/test_functions.py::FunctionTests::test_itertools_permutations_various_iterators, test/dynamo/test_functions.py::FunctionTests::test_itertools_product, test/dynamo/test_functions.py::FunctionTests::test_itertools_product_args, test/dynamo/test_functions.py::FunctionTests::test_itertools_product_various_iterators, test/dynamo/test_functions.py::FunctionTests::test_itertools_reconstruct, test/dynamo/test_functions.py::FunctionTests::test_jit_annotate, test/dynamo/test_functions.py::FunctionTests::test_len_constant_dict, test/dynamo/test_functions.py::FunctionTests::test_len_constant_list, test/dynamo/test_functions.py::FunctionTests::test_len_constant_misc_iterables, test/dynamo/test_functions.py::FunctionTests::test_len_tensor, test/dynamo/test_functions.py::FunctionTests::test_list_add, test/dynamo/test_functions.py::FunctionTests::test_list_add_then_mutate, test/dynamo/test_functions.py::FunctionTests::test_list_clear, test/dynamo/test_functions.py::FunctionTests::test_list_compare_polyfill, test/dynamo/test_functions.py::FunctionTests::test_list_compare_polyfill_non_lists, test/dynamo/test_functions.py::FunctionTests::test_list_convert, test/dynamo/test_functions.py::FunctionTests::test_list_expand_lhs, test/dynamo/test_functions.py::FunctionTests::test_list_index_with_constant_tensor, test/dynamo/test_functions.py::FunctionTests::test_list_reversed, test/dynamo/test_functions.py::FunctionTests::test_list_setitem, test/dynamo/test_functions.py::FunctionTests::test_list_setitem_slice, test/dynamo/test_functions.py::FunctionTests::test_list_slice, test/dynamo/test_functions.py::FunctionTests::test_list_slice_assignment, test/dynamo/test_functions.py::FunctionTests::test_list_sorted1, test/dynamo/test_functions.py::FunctionTests::test_list_sorted2, test/dynamo/test_functions.py::FunctionTests::test_list_truth, test/dynamo/test_functions.py::FunctionTests::test_listarg1, test/dynamo/test_functions.py::FunctionTests::test_listarg2, test/dynamo/test_functions.py::FunctionTests::test_listarg3, test/dynamo/test_functions.py::FunctionTests::test_listarg4, test/dynamo/test_functions.py::FunctionTests::test_listarg5, test/dynamo/test_functions.py::FunctionTests::test_load_global_bool, test/dynamo/test_functions.py::FunctionTests::test_lru_cache_warning_issued_during_tracing, test/dynamo/test_functions.py::FunctionTests::test_mT, test/dynamo/test_functions.py::FunctionTests::test_manual_seed, test/dynamo/test_functions.py::FunctionTests::test_map_call_function_ex, test/dynamo/test_functions.py::FunctionTests::test_map_deque_extendleft, test/dynamo/test_functions.py::FunctionTests::test_map_dict_fromkeys, test/dynamo/test_functions.py::FunctionTests::test_map_enumerate, test/dynamo/test_functions.py::FunctionTests::test_map_infinite, test/dynamo/test_functions.py::FunctionTests::test_map_iter, test/dynamo/test_functions.py::FunctionTests::test_map_list, test/dynamo/test_functions.py::FunctionTests::test_map_list_extend, test/dynamo/test_functions.py::FunctionTests::test_map_list_slice_assign, test/dynamo/test_functions.py::FunctionTests::test_map_max, test/dynamo/test_functions.py::FunctionTests::test_map_max_const, test/dynamo/test_functions.py::FunctionTests::test_map_partial_unpack, test/dynamo/test_functions.py::FunctionTests::test_map_reconstruct, test/dynamo/test_functions.py::FunctionTests::test_map_reduce, test/dynamo/test_functions.py::FunctionTests::test_map_return, test/dynamo/test_functions.py::FunctionTests::test_map_set, test/dynamo/test_functions.py::FunctionTests::test_map_sorted, test/dynamo/test_functions.py::FunctionTests::test_map_str_join, test/dynamo/test_functions.py::FunctionTests::test_map_sum, test/dynamo/test_functions.py::FunctionTests::test_map_tuple, test/dynamo/test_functions.py::FunctionTests::test_map_unpack_twice, test/dynamo/test_functions.py::FunctionTests::test_map_unpack_vars, test/dynamo/test_functions.py::FunctionTests::test_map_with_graph_break, test/dynamo/test_functions.py::FunctionTests::test_map_zip_dict, test/dynamo/test_functions.py::FunctionTests::test_math_radians, test/dynamo/test_functions.py::FunctionTests::test_mean_sum_np, test/dynamo/test_functions.py::FunctionTests::test_methodcall1, test/dynamo/test_functions.py::FunctionTests::test_methodcall2, test/dynamo/test_functions.py::FunctionTests::test_methodcall3, test/dynamo/test_functions.py::FunctionTests::test_methodcaller, test/dynamo/test_functions.py::FunctionTests::test_min_max, test/dynamo/test_functions.py::FunctionTests::test_module_constant, test/dynamo/test_functions.py::FunctionTests::test_namedtuple, test/dynamo/test_functions.py::FunctionTests::test_namedtuple_defaults, test/dynamo/test_functions.py::FunctionTests::test_namedtuple_fields, test/dynamo/test_functions.py::FunctionTests::test_namedtuple_hasattr, test/dynamo/test_functions.py::FunctionTests::test_namedtuple_replace, test/dynamo/test_functions.py::FunctionTests::test_namedtuple_subclass, test/dynamo/test_functions.py::FunctionTests::test_namedtuple_user_methods, test/dynamo/test_functions.py::FunctionTests::test_ndarray_builtin_functions, test/dynamo/test_functions.py::FunctionTests::test_ndarray_method, test/dynamo/test_functions.py::FunctionTests::test_ndarray_methods_returning_scalar, test/dynamo/test_functions.py::FunctionTests::test_ndarray_reshape, test/dynamo/test_functions.py::FunctionTests::test_ndarray_transpose, test/dynamo/test_functions.py::FunctionTests::test_ndim, test/dynamo/test_functions.py::FunctionTests::test_no_recompile_inner_function, test/dynamo/test_functions.py::FunctionTests::test_no_recompile_inner_lambda, test/dynamo/test_functions.py::FunctionTests::test_non_inlined_closure, test/dynamo/test_functions.py::FunctionTests::test_not_list, test/dynamo/test_functions.py::FunctionTests::test_np_constant_collections_as_input_int_or_float_float, test/dynamo/test_functions.py::FunctionTests::test_np_constant_collections_as_input_int_or_float_int, test/dynamo/test_functions.py::FunctionTests::test_np_constant_collections_guards_float, test/dynamo/test_functions.py::FunctionTests::test_np_constant_collections_guards_int, test/dynamo/test_functions.py::FunctionTests::test_np_finfo, test/dynamo/test_functions.py::FunctionTests::test_np_iinfo, test/dynamo/test_functions.py::FunctionTests::test_number_method_method_as_integer_ratio_num_type0, test/dynamo/test_functions.py::FunctionTests::test_number_method_method_as_integer_ratio_num_type3, test/dynamo/test_functions.py::FunctionTests::test_number_method_method_bit_length_num_type1, test/dynamo/test_functions.py::FunctionTests::test_number_method_method_conjugate_num_type2, test/dynamo/test_functions.py::FunctionTests::test_number_method_method_conjugate_num_type4, test/dynamo/test_functions.py::FunctionTests::test_number_method_method_hex_num_type5, test/dynamo/test_functions.py::FunctionTests::test_number_method_method_is_integer_num_type6, test/dynamo/test_functions.py::FunctionTests::test_numpy_attributes, test/dynamo/test_functions.py::FunctionTests::test_numpy_dtype_argument_to_function, test/dynamo/test_functions.py::FunctionTests::test_numpy_dtype_call_in_function, test/dynamo/test_functions.py::FunctionTests::test_numpy_fft, test/dynamo/test_functions.py::FunctionTests::test_numpy_linalg, test/dynamo/test_functions.py::FunctionTests::test_numpy_meshgrid, test/dynamo/test_functions.py::FunctionTests::test_numpy_random, test/dynamo/test_functions.py::FunctionTests::test_numpy_size, test/dynamo/test_functions.py::FunctionTests::test_obj_eq, test/dynamo/test_functions.py::FunctionTests::test_obj_is, test/dynamo/test_functions.py::FunctionTests::test_ordered_dict_kwargs, test/dynamo/test_functions.py::FunctionTests::test_partial_across_graph_break_uninvoked, test/dynamo/test_functions.py::FunctionTests::test_partials_as_input_UDF, test/dynamo/test_functions.py::FunctionTests::test_partials_as_input_partials_lambda, test/dynamo/test_functions.py::FunctionTests::test_partials_as_input_partials_mod, test/dynamo/test_functions.py::FunctionTests::test_partials_graph_break_reconstruct, test/dynamo/test_functions.py::FunctionTests::test_partials_graph_break_reconstruct_args_and_kwargs, test/dynamo/test_functions.py::FunctionTests::test_partials_graph_break_reconstruct_mix, test/dynamo/test_functions.py::FunctionTests::test_partials_graph_break_reconstruct_mix_no_source, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___annotations__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___builtins__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___call__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___class__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___closure__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___code__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___defaults__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___delattr__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___dict__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___dir__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___doc__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___eq__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___format__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___ge__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___get__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___getattribute__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___globals__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___gt__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___hash__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___init__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___init_subclass__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___kwdefaults__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___le__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___lt__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___module__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___name__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___ne__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___new__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___qualname__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___reduce__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___reduce_ex__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___repr__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___setattr__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___sizeof__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___str__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___subclasshook__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr_args, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr_func, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr_keywords, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_set_attr, test/dynamo/test_functions.py::FunctionTests::test_partials_lambda, test/dynamo/test_functions.py::FunctionTests::test_partials_recompilation, test/dynamo/test_functions.py::FunctionTests::test_partials_torch_op_arg, test/dynamo/test_functions.py::FunctionTests::test_partials_torch_op_kwarg, test/dynamo/test_functions.py::FunctionTests::test_partials_udf_arg, test/dynamo/test_functions.py::FunctionTests::test_partials_udf_kwarg, test/dynamo/test_functions.py::FunctionTests::test_partials_udf_kwarg_method, test/dynamo/test_functions.py::FunctionTests::test_partials_udf_kwarg_module, test/dynamo/test_functions.py::FunctionTests::test_pop, test/dynamo/test_functions.py::FunctionTests::test_pos, test/dynamo/test_functions.py::FunctionTests::test_pow_int, test/dynamo/test_functions.py::FunctionTests::test_promote_types, test/dynamo/test_functions.py::FunctionTests::test_rand_inlined, test/dynamo/test_functions.py::FunctionTests::test_rand_tensor_partial, test/dynamo/test_functions.py::FunctionTests::test_range1, test/dynamo/test_functions.py::FunctionTests::test_range2, test/dynamo/test_functions.py::FunctionTests::test_range_iterator, test/dynamo/test_functions.py::FunctionTests::test_range_iterator_2, test/dynamo/test_functions.py::FunctionTests::test_range_iterator_graph_break, test/dynamo/test_functions.py::FunctionTests::test_range_iterator_graph_break_2, test/dynamo/test_functions.py::FunctionTests::test_range_length, test/dynamo/test_functions.py::FunctionTests::test_range_with_index, test/dynamo/test_functions.py::FunctionTests::test_range_with_slice_index, test/dynamo/test_functions.py::FunctionTests::test_reduce, test/dynamo/test_functions.py::FunctionTests::test_reduce_with_initial, test/dynamo/test_functions.py::FunctionTests::test_reduce_with_none_initial, test/dynamo/test_functions.py::FunctionTests::test_reduce_with_single, test/dynamo/test_functions.py::FunctionTests::test_reduce_with_single_with_initial, test/dynamo/test_functions.py::FunctionTests::test_return_dict, test/dynamo/test_functions.py::FunctionTests::test_return_dict2, test/dynamo/test_functions.py::FunctionTests::test_return_multiple_numpy_ndarray, test/dynamo/test_functions.py::FunctionTests::test_return_numpy_ndarray, test/dynamo/test_functions.py::FunctionTests::test_return_tuple1, test/dynamo/test_functions.py::FunctionTests::test_return_tuple2, test/dynamo/test_functions.py::FunctionTests::test_returning_recursive_func, test/dynamo/test_functions.py::FunctionTests::test_round, test/dynamo/test_functions.py::FunctionTests::test_set_add, test/dynamo/test_functions.py::FunctionTests::test_set_in_frozenset, test/dynamo/test_functions.py::FunctionTests::test_set_keys_view, test/dynamo/test_functions.py::FunctionTests::test_set_update_bytecode, test/dynamo/test_functions.py::FunctionTests::test_set_update_list_with_duplicated_items, test/dynamo/test_functions.py::FunctionTests::test_shape1, test/dynamo/test_functions.py::FunctionTests::test_shape2, test/dynamo/test_functions.py::FunctionTests::test_size_tuple_add, test/dynamo/test_functions.py::FunctionTests::test_slice1, test/dynamo/test_functions.py::FunctionTests::test_slice2, test/dynamo/test_functions.py::FunctionTests::test_slice3, test/dynamo/test_functions.py::FunctionTests::test_slice4, test/dynamo/test_functions.py::FunctionTests::test_slice5, test/dynamo/test_functions.py::FunctionTests::test_slice6, test/dynamo/test_functions.py::FunctionTests::test_slice_eq, test/dynamo/test_functions.py::FunctionTests::test_sliced_range, test/dynamo/test_functions.py::FunctionTests::test_sorted_const_key_non_const_items, test/dynamo/test_functions.py::FunctionTests::test_sourceless_build_method_type, test/dynamo/test_functions.py::FunctionTests::test_startswith, test/dynamo/test_functions.py::FunctionTests::test_sum, test/dynamo/test_functions.py::FunctionTests::test_sum_shortcut, test/dynamo/test_functions.py::FunctionTests::test_sum_shortcut_with_start_arg, test/dynamo/test_functions.py::FunctionTests::test_sum_shortcut_with_start_kwarg, test/dynamo/test_functions.py::FunctionTests::test_sum_with_start_arg, test/dynamo/test_functions.py::FunctionTests::test_sum_with_start_kwarg, test/dynamo/test_functions.py::FunctionTests::test_symbool_to_int, test/dynamo/test_functions.py::FunctionTests::test_tensor_dim, test/dynamo/test_functions.py::FunctionTests::test_tensor_element_size, test/dynamo/test_functions.py::FunctionTests::test_tensor_is_complex, test/dynamo/test_functions.py::FunctionTests::test_tensor_len, test/dynamo/test_functions.py::FunctionTests::test_tensor_new_with_shape, test/dynamo/test_functions.py::FunctionTests::test_tensor_new_with_size, test/dynamo/test_functions.py::FunctionTests::test_tensor_size, test/dynamo/test_functions.py::FunctionTests::test_tensor_size_indexed_by_symint, test/dynamo/test_functions.py::FunctionTests::test_tensor_type, test/dynamo/test_functions.py::FunctionTests::test_tensor_type2, test/dynamo/test_functions.py::FunctionTests::test_tensor_type3, test/dynamo/test_functions.py::FunctionTests::test_tensor_type4, test/dynamo/test_functions.py::FunctionTests::test_tensor_type5, test/dynamo/test_functions.py::FunctionTests::test_to, test/dynamo/test_functions.py::FunctionTests::test_torch_distributions_functions, test/dynamo/test_functions.py::FunctionTests::test_torch_from_numpy, test/dynamo/test_functions.py::FunctionTests::test_torch_get_device_module, test/dynamo/test_functions.py::FunctionTests::test_torch_size_as_dict_key, test/dynamo/test_functions.py::FunctionTests::test_torch_size_hasattr, test/dynamo/test_functions.py::FunctionTests::test_torch_source, test/dynamo/test_functions.py::FunctionTests::test_transpose_for_scores, test/dynamo/test_functions.py::FunctionTests::test_truth, test/dynamo/test_functions.py::FunctionTests::test_tuple1, test/dynamo/test_functions.py::FunctionTests::test_tuple2, test/dynamo/test_functions.py::FunctionTests::test_tuple_contains, test/dynamo/test_functions.py::FunctionTests::test_tuple_iadd, test/dynamo/test_functions.py::FunctionTests::test_tuple_map, test/dynamo/test_functions.py::FunctionTests::test_tuple_sorted, test/dynamo/test_functions.py::FunctionTests::test_two_point_iter, test/dynamo/test_functions.py::FunctionTests::test_unary_fold_op, test/dynamo/test_functions.py::FunctionTests::test_unary_fold_op_seq, test/dynamo/test_functions.py::FunctionTests::test_unpack1, test/dynamo/test_functions.py::FunctionTests::test_unpack2, test/dynamo/test_functions.py::FunctionTests::test_unpack3, test/dynamo/test_functions.py::FunctionTests::test_unpack_ex1, test/dynamo/test_functions.py::FunctionTests::test_unpack_ex2, test/dynamo/test_functions.py::FunctionTests::test_unpack_ex3, test/dynamo/test_functions.py::FunctionTests::test_unpack_mutable_map, test/dynamo/test_functions.py::FunctionTests::test_unsqueeze_inplace, test/dynamo/test_functions.py::FunctionTests::test_viamethod, test/dynamo/test_functions.py::FunctionTests::test_viatorch, test/dynamo/test_functions.py::FunctionTests::test_zip_longest, test/dynamo/test_functions.py::FunctionTests::test_zip_reconstruct, test/dynamo/test_functions.py::DefaultsTests::test_cast_tensor_single_elem, test/dynamo/test_functions.py::DefaultsTests::test_cuda_current_device, test/dynamo/test_functions.py::DefaultsTests::test_dataclass_factory, test/dynamo/test_functions.py::DefaultsTests::test_dataclass_nested, test/dynamo/test_functions.py::DefaultsTests::test_fn_with_attr, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_construction, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_illegal_call_method, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_reconstruction, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_return_type_method_name_copy, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_return_type_method_name_difference, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_return_type_method_name_intersection, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_return_type_method_name_symmetric_difference, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_return_type_method_name_union, test/dynamo/test_functions.py::DefaultsTests::test_func_attrs, test/dynamo/test_functions.py::DefaultsTests::test_func_default_tensor_args, test/dynamo/test_functions.py::DefaultsTests::test_func_default_torch_args, test/dynamo/test_functions.py::DefaultsTests::test_functional_compile, test/dynamo/test_functions.py::DefaultsTests::test_functools_partial_id, test/dynamo/test_functions.py::DefaultsTests::test_fx_immutable_list_mutation_not_allowed, test/dynamo/test_functions.py::DefaultsTests::test_fx_map_aggregate, test/dynamo/test_functions.py::DefaultsTests::test_in_set_inplace, test/dynamo/test_functions.py::DefaultsTests::test_in_set_would_fail_broadcast, test/dynamo/test_functions.py::DefaultsTests::test_inspect_method_source, test/dynamo/test_functions.py::DefaultsTests::test_is_init_in_compile_mutated_tensor_tensor, test/dynamo/test_functions.py::DefaultsTests::test_is_init_in_compile_vmapped_mutated_tensor_tensor, test/dynamo/test_functions.py::DefaultsTests::test_is_init_in_compile_vmapped_mutated_tensor_tensor_multi_arg, test/dynamo/test_functions.py::DefaultsTests::test_is_mutated_tensor_tensor, test/dynamo/test_functions.py::DefaultsTests::test_is_mutated_tensor_tensor_across_graph_break, test/dynamo/test_functions.py::DefaultsTests::test_is_not_tensor_tensor, test/dynamo/test_functions.py::DefaultsTests::test_is_tensor_tensor, test/dynamo/test_functions.py::DefaultsTests::test_is_vmapped_mutated_tensor_tensor, test/dynamo/test_functions.py::DefaultsTests::test_keyword, test/dynamo/test_functions.py::DefaultsTests::test_listlike_of_tensors_contains_constant, test/dynamo/test_functions.py::DefaultsTests::test_meth_default_tensor_args, test/dynamo/test_functions.py::DefaultsTests::test_pybind_object, test/dynamo/test_functions.py::DefaultsTests::test_reconstructed_name, test/dynamo/test_functions.py::DefaultsTests::test_set_call___init___frozenset, test/dynamo/test_functions.py::DefaultsTests::test_set_call___init___set, test/dynamo/test_functions.py::DefaultsTests::test_set_construction, test/dynamo/test_functions.py::DefaultsTests::test_skip_function_call_very_weird_value, test/dynamo/test_functions.py::DefaultsTests::test_str_handler_for_user_defined_object, test/dynamo/test_functions.py::DefaultsTests::test_sys_recursionlimit, test/dynamo/test_functions.py::DefaultsTests::test_tree_map, test/dynamo/test_functions.py::DefaultsTests::test_udf_list, test/dynamo/test_functions.py::DefaultsTests::test_udf_list_reconstruction, test/dynamo/test_functions.py::DefaultsTests::test_udf_list_slice, test/dynamo/test_functions.py::DefaultsTests::test_udf_namedtuple, test/dynamo/test_functions.py::DefaultsTests::test_udf_tuple, test/dynamo/test_functions.py::DefaultsTests::test_udf_tuple_construction, test/dynamo/test_functions.py::DefaultsTests::test_udf_tuple_construction_custom_new, test/dynamo/test_functions.py::DefaultsTests::test_udf_tuple_reconstruction, test/dynamo/test_functions.py::DefaultsTests::test_zip_strict 2025-09-07T06:55:15.8802192Z 2025-09-07T06:55:15.8802274Z Running dynamo/test_generator 1/1 ... [2025-09-07 06:55:15.870187] 2025-09-07T06:55:15.8802439Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:55:15.8802822Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_generator.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:55:15.870415] 2025-09-07T06:55:19.5419261Z 2025-09-07T06:55:19.5420753Z dynamo/test_generator 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_generator_1.1_42bea212c9e7920b_.log 2025-09-07T06:55:19.5442272Z Running 78 items in this shard: test/dynamo/test_generator.py::GeneratorTests::test_cleanup_throw, test/dynamo/test_generator.py::GeneratorTests::test_deque_extendleft, test/dynamo/test_generator.py::GeneratorTests::test_dict_tuple_list_generator_container0, test/dynamo/test_generator.py::GeneratorTests::test_dict_tuple_list_generator_container1, test/dynamo/test_generator.py::GeneratorTests::test_dict_tuple_list_generator_container2, test/dynamo/test_generator.py::GeneratorTests::test_dict_tuple_list_generator_container3, test/dynamo/test_generator.py::GeneratorTests::test_dynamo_disable_generator, test/dynamo/test_generator.py::GeneratorTests::test_dynamo_disable_sub_generator, test/dynamo/test_generator.py::GeneratorTests::test_generator___contains__, test/dynamo/test_generator.py::GeneratorTests::test_generator___contains___side_effects, test/dynamo/test_generator.py::GeneratorTests::test_generator_as_argument, test/dynamo/test_generator.py::GeneratorTests::test_generator_as_argument_2, test/dynamo/test_generator.py::GeneratorTests::test_generator_as_argument_3, test/dynamo/test_generator.py::GeneratorTests::test_generator_as_argument_4, test/dynamo/test_generator.py::GeneratorTests::test_generator_simple, test/dynamo/test_generator.py::GeneratorTests::test_generator_with_side_effects, test/dynamo/test_generator.py::GeneratorTests::test_generator_with_side_effects_graph_break, test/dynamo/test_generator.py::GeneratorTests::test_generator_with_side_effects_graph_break_2, test/dynamo/test_generator.py::GeneratorTests::test_graph_break_and_reconstruct_generator, test/dynamo/test_generator.py::GeneratorTests::test_graph_break_before_calling_generator, test/dynamo/test_generator.py::GeneratorTests::test_graph_break_in_generator, test/dynamo/test_generator.py::GeneratorTests::test_graph_break_in_generator_2, test/dynamo/test_generator.py::GeneratorTests::test_graph_break_in_generator_while_reconstructing, test/dynamo/test_generator.py::GeneratorTests::test_graph_break_outside_generator, test/dynamo/test_generator.py::GeneratorTests::test_infinite_generator, test/dynamo/test_generator.py::GeneratorTests::test_infinite_generator_2, test/dynamo/test_generator.py::GeneratorTests::test_infinite_generator_3, test/dynamo/test_generator.py::GeneratorTests::test_islice_chain, test/dynamo/test_generator.py::GeneratorTests::test_iter, test/dynamo/test_generator.py::GeneratorTests::test_list_extend, test/dynamo/test_generator.py::GeneratorTests::test_list_zip_generator, test/dynamo/test_generator.py::GeneratorTests::test_reconstruct_generator_tensor_mutation, test/dynamo/test_generator.py::GeneratorTests::test_reconstruct_generator_with_dict_mutation, test/dynamo/test_generator.py::GeneratorTests::test_reconstruct_generator_with_dict_mutation_before, test/dynamo/test_generator.py::GeneratorTests::test_reconstruct_generator_with_local_var_mutation, test/dynamo/test_generator.py::GeneratorTests::test_reconstruct_generator_with_object_mutation, test/dynamo/test_generator.py::GeneratorTests::test_reconstruct_generator_with_object_mutation_before, test/dynamo/test_generator.py::GeneratorTests::test_return_advanced_generator, test/dynamo/test_generator.py::GeneratorTests::test_return_exhaust_generator, test/dynamo/test_generator.py::GeneratorTests::test_return_generator, test/dynamo/test_generator.py::GeneratorTests::test_return_subgenerator, test/dynamo/test_generator.py::GeneratorTests::test_return_tuple_generator, test/dynamo/test_generator.py::GeneratorTests::test_subgenerator, test/dynamo/test_generator.py::GeneratorTests::test_subgenerator_with_side_effects, test/dynamo/test_generator.py::GeneratorTests::test_zip_generator, test/dynamo/test_generator.py::GeneratorTests::test_zip_generator_2, test/dynamo/test_generator.py::GeneratorTests::test_zip_infinite_generator, test/dynamo/test_generator.py::GeneratorTests::test_zip_subgenerator, test/dynamo/test_generator.py::TestGeneratorSend::test_send, test/dynamo/test_generator.py::TestGeneratorSend::test_send_stop_iteration_fullgraph_False, test/dynamo/test_generator.py::TestGeneratorSend::test_send_stop_iteration_fullgraph_True, test/dynamo/test_generator.py::TestGeneratorClose::test_close, test/dynamo/test_generator.py::TestGeneratorClose::test_close_after_close, test/dynamo/test_generator.py::TestGeneratorClose::test_close_after_exception, test/dynamo/test_generator.py::TestGeneratorClose::test_close_capture_GeneratorExit_fullgraph_False, test/dynamo/test_generator.py::TestGeneratorClose::test_close_capture_GeneratorExit_fullgraph_True, test/dynamo/test_generator.py::TestGeneratorClose::test_close_capture_GeneratorExit_return, test/dynamo/test_generator.py::TestGeneratorClose::test_close_capture_and_reraise_GeneratorExit, test/dynamo/test_generator.py::TestGeneratorClose::test_close_capture_and_reraise_exc_exc0, test/dynamo/test_generator.py::TestGeneratorClose::test_close_capture_and_reraise_exc_exc1, test/dynamo/test_generator.py::TestGeneratorClose::test_close_handling_finally, test/dynamo/test_generator.py::TestGeneratorClose::test_close_subgen, test/dynamo/test_generator.py::TestGeneratorClose::test_close_with_side_effects, test/dynamo/test_generator.py::TestGeneratorClose::test_close_with_subgen, test/dynamo/test_generator.py::TestGeneratorClose::test_next_after_close_fullgraph_False, test/dynamo/test_generator.py::TestGeneratorClose::test_next_after_close_fullgraph_True, test/dynamo/test_generator.py::TestGeneratorThrow::test_exception_context_with_yield, test/dynamo/test_generator.py::TestGeneratorThrow::test_return_None_in_except_and_finally, test/dynamo/test_generator.py::TestGeneratorThrow::test_return_const_value_in_except_and_finally, test/dynamo/test_generator.py::TestGeneratorThrow::test_return_value_in_except_and_finally, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw_no_yield_after_throw, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw_not_catch, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw_raise_difference_exc, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw_try_except_finally, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw_with_finally, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw_without_finally, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw_yield_finally 2025-09-07T06:55:19.5451629Z 2025-09-07T06:55:19.5451709Z Running dynamo/test_global 1/1 ... [2025-09-07 06:55:19.542035] 2025-09-07T06:55:19.5451874Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:55:19.5452252Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_global.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:55:19.542271] 2025-09-07T06:55:26.5191526Z 2025-09-07T06:55:26.5199774Z dynamo/test_global 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_global_1.1_b385a5a38ac9780a_.log 2025-09-07T06:55:26.5201620Z Running 12 items in this shard: test/dynamo/test_global.py::TestGlobals::test_store_global_1, test/dynamo/test_global.py::TestGlobals::test_store_global_2, test/dynamo/test_global.py::TestGlobals::test_store_global_cross_file, test/dynamo/test_global.py::TestGlobals::test_store_global_crossfile_inline, test/dynamo/test_global.py::TestGlobals::test_store_global_dict, test/dynamo/test_global.py::TestGlobals::test_store_global_dict_2, test/dynamo/test_global.py::TestGlobals::test_store_global_inline_1, test/dynamo/test_global.py::TestGlobals::test_store_global_inline_2, test/dynamo/test_global.py::TestGlobals::test_store_global_list, test/dynamo/test_global.py::TestGlobals::test_store_global_list_2, test/dynamo/test_global.py::TestGlobals::test_store_global_new, test/dynamo/test_global.py::TestGlobals::test_store_global_object 2025-09-07T06:55:26.5202786Z 2025-09-07T06:55:26.5202889Z Running dynamo/test_graph_region_tracker 1/1 ... [2025-09-07 06:55:26.518855] 2025-09-07T06:55:26.5203070Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:55:26.5203471Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_graph_region_tracker.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:55:26.519046] 2025-09-07T06:55:29.8067105Z 2025-09-07T06:55:29.8068580Z dynamo/test_graph_region_tracker 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_graph_region_tracker_1.1_e56386b2eb378383_.log 2025-09-07T06:55:29.8071125Z Running 13 items in this shard: test/dynamo/test_graph_region_tracker.py::GraphRegionTrackerTests::test_get_regions_multiple_region_groups, test/dynamo/test_graph_region_tracker.py::GraphRegionTrackerTests::test_get_regions_single_region_group, test/dynamo/test_graph_region_tracker.py::GraphRegionTrackerTests::test_mismatched_arg_shapes, test/dynamo/test_graph_region_tracker.py::GraphRegionTrackerTests::test_mismatched_dtypes, test/dynamo/test_graph_region_tracker.py::GraphRegionTrackerTests::test_mismatched_global_state, test/dynamo/test_graph_region_tracker.py::GraphRegionTrackerTests::test_mutation_tracking_allow_in_graph, test/dynamo/test_graph_region_tracker.py::GraphRegionTrackerTests::test_mutation_tracking_setitem, test/dynamo/test_graph_region_tracker.py::GraphRegionTrackerTests::test_mutation_tracking_simple, test/dynamo/test_graph_region_tracker.py::GraphRegionTrackerTests::test_nested_args, test/dynamo/test_graph_region_tracker.py::GraphRegionTrackerTests::test_no_duplicate_tracking, test/dynamo/test_graph_region_tracker.py::GraphRegionTrackerTests::test_no_single_node_regions, test/dynamo/test_graph_region_tracker.py::GraphRegionTrackerTests::test_non_tensor_arg_hashing, test/dynamo/test_graph_region_tracker.py::GraphRegionTrackerTests::test_region_sorting 2025-09-07T06:55:29.8073189Z 2025-09-07T06:55:29.8073273Z Running dynamo/test_guard_manager 1/1 ... [2025-09-07 06:55:29.804621] 2025-09-07T06:55:29.8079756Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:55:29.8080160Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_guard_manager.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:55:29.804837] 2025-09-07T06:55:36.5427539Z 2025-09-07T06:55:36.5428489Z dynamo/test_guard_manager 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_guard_manager_1.1_1408ddc5cb1fd047_.log 2025-09-07T06:55:36.5433561Z Running 37 items in this shard: test/dynamo/test_guard_manager.py::GuardManagerTests::test_attr_guard_manager, test/dynamo/test_guard_manager.py::GuardManagerTests::test_call_function_no_args_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_clone, test/dynamo/test_guard_manager.py::GuardManagerTests::test_default_device_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_dict_contains_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_dict_getitem_accessor, test/dynamo/test_guard_manager.py::GuardManagerTests::test_dict_guard_manager, test/dynamo/test_guard_manager.py::GuardManagerTests::test_dict_version_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_diff_guard_manager, test/dynamo/test_guard_manager.py::GuardManagerTests::test_dynamic_indices_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_equals_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_framelocals_accessor, test/dynamo/test_guard_manager.py::GuardManagerTests::test_framelocals_guard_e2e, test/dynamo/test_guard_manager.py::GuardManagerTests::test_global_state_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_global_state_reason, test/dynamo/test_guard_manager.py::GuardManagerTests::test_global_weakref, test/dynamo/test_guard_manager.py::GuardManagerTests::test_globals, test/dynamo/test_guard_manager.py::GuardManagerTests::test_guard_manager_leaf_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_id_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_item_guard_manager, test/dynamo/test_guard_manager.py::GuardManagerTests::test_lambda_manager, test/dynamo/test_guard_manager.py::GuardManagerTests::test_length_check_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_no_hasattr_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_no_tensor_aliasing_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_python_lambda_leaf_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_tensor_aliasing_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_tensor_match_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_tuple_iterator_getitem, test/dynamo/test_guard_manager.py::GuardManagerTests::test_type_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_type_manager, test/dynamo/test_guard_manager.py::GuardManagerTests::test_weakref_alive_guard, test/dynamo/test_guard_manager.py::TypePropagationTests::test_basic_types, test/dynamo/test_guard_manager.py::TagSafetyChecks::test_dict_tag_safe, test/dynamo/test_guard_manager.py::TagSafetyChecks::test_immutable_tag_safe, test/dynamo/test_guard_manager.py::TagSafetyChecks::test_nn_module_tag_overridden_getattr_safe, test/dynamo/test_guard_manager.py::TagSafetyChecks::test_nn_module_tag_safe, test/dynamo/test_guard_manager.py::RecursiveDictGuardTests::test_disabling 2025-09-07T06:55:36.5438205Z 2025-09-07T06:55:36.5444284Z Running dynamo/test_higher_order_ops 1/1 ... [2025-09-07 06:55:36.542698] 2025-09-07T06:55:36.5444664Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:55:36.5459835Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_higher_order_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:55:36.542933] 2025-09-07T06:56:14.8921571Z 2025-09-07T06:56:14.8922300Z dynamo/test_higher_order_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_higher_order_ops_1.1_95de588b5edf585a_.log 2025-09-07T06:56:14.8953040Z Running 229 items in this shard: test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_access_module_attr, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_allow_python_side_effects_utility, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_constants, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_global_num, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_global_num_adds_guard, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_input_num, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_numpy_number, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_tracked, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_tracked_nested, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_untracked_global, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_untracked_global_nested, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_untracked_nonlocal, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_capture_value_created_in_subgraph, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_concat_unbacked_shape_tensor, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_branches_no_arguments, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_branches_no_arguments_no_closure, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_free_variable_in_both_branches, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_graph_break_in_one_branch, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_pytree_operands, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_pytree_operands_with_non_tensor_leaves, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_side_effect_in_one_branches, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_source_fn_stack, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_subgraph_name_is_valid, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_with_constant_pred, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_cond_with_empty_operands, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_dynamic_shapes_over_vmap_batch_size, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_enum_arg, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_error_message_sane, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_fallback_on_graph_break_complicated, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_fallback_on_graph_break_simple, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_flat_list_output, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_fn_with_kwargs_in_torch_ops, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_freevars_as_inputs_to_wrap, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_grad_source_fn_stack, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_hints_wrapper, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_hints_wrapper_incorrect_type, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_hints_wrapper_no_hints, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_hints_wrapper_pytree_inputs, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_hooks, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_hopify_generic_wrap, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_inlined_functions, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_internal_nonlocal, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_lift_tensor_constant, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_lift_tensors_with_compound_expressions, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_lift_tensors_with_shared_symbols, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_make_closure, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_map_example_value_metadata_consistent_with_eager, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_map_graph_break, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_map_kwargs, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_map_lowers_to_graph, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_map_multi_return, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_map_pytree_return, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_map_side_effect, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_map_source_fn_stack, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_map_subgraph_name_is_valid, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_map_symint_input, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_modules, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_nested_tuple_output, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_nested_wrap, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_no_freevars, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_output_with_dict, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_register_mode, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_register_subclass, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_return_captured_var, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_return_captured_var_used_multiple_times, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_return_captured_vars, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_same_freevar_twice, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_del_existing_attr_global_module, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_del_existing_attr_global_obj, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_del_existing_attr_nonlocal_module, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_del_existing_attr_nonlocal_obj, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_in_body, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_local_list_append_no_graph_break, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_mutate_global_list, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_mutate_global_num, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_mutate_global_num_builtin, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_mutate_global_tensor, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_mutate_global_tensor_builtin, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_mutate_nonlocal_num, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_mutate_nonlocal_num_builtin, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_mutate_nonlocal_tensor, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_mutate_nonlocal_tensor_builtin, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_nested_nonlocal_list_append_graph_break, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_nonlocal_list_append_graph_break, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_set_existing_attr_global_module, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_set_existing_attr_global_obj, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_set_existing_attr_nonlocal_module, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_set_existing_attr_nonlocal_obj, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_set_new_attr_global_module, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_set_new_attr_global_obj, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_set_new_attr_nonlocal_module, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_side_effect_set_new_attr_nonlocal_obj, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_support_float_in_output, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_symint_in_slice, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_symint_input, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_tensor_and_unbacked_symbol_closure, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_tensor_to_list_closure, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_tensor_with_unbacked_shape_closure, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_unbacked_symbol_closure, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_vmap_multiply_scalar, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_vmap_source_fn_stack, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_all_kwarg, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_allow_local_assign_in_body_fn, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_kwarg, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_kwarg_default, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_kwarg_default_else_branch, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_kwarg_default_if_branch, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_kwarg_int, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_kwarg_only, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_kwarg_recompile, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_pytree_args_nested, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_pytree_args_not_const_symint_tensor, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_pytree_args_with_symint_constant, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_pytree_kwargs, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_source_fn_stack, test/dynamo/test_higher_order_ops.py::HigherOrderOpTests::test_wrap_subgraph_name_is_valid, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_dual_level_guard, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_emit_functorch_guard_if_active, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_grad_guard_fail, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_jvp_guard_fail, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_linearize_recompiles, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_vmap_grad_guard_ok, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_vmap_grad_vmap_guard_fail, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_vmap_guard_fail, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_vmap_guard_fail_different_state, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_vmap_guard_ok, test/dynamo/test_higher_order_ops.py::HigherOrderOpVmapGuardTests::test_vmap_recompile_different_states, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_functional_call, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_functional_call_disable_inline_nn_module, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_functional_call_sequential_params_and_buffers, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_call_compiled_backward_fn, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_call_torch_compile_fn, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_capture_tensor, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_closure_scalar, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_fn_with_kwargs, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_freevar_python_scalar, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_freevar_tensor, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_has_aux, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_non_tensor_input, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_over_grad, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_pytree, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_recompile, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_two_tensor_all_grad_has_aux, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_two_tensor_has_aux, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_with_graph_break, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_grad_with_side_effect, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_hessian, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_hessian_argnums, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jacfwd, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jacfwd_has_aux, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jacfwd_randomness, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jacfwd_two_tensors_argnums, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jacrev, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jacrev_has_aux, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jacrev_two_tensors_argnums, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jvp_call_torch_compile_fn, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jvp_freevar_python_scalar, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jvp_freevar_tensor, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jvp_has_aux, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jvp_jvp, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jvp_simple, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jvp_two_tensors_disable_enable_disable_grad, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jvp_two_tensors_disable_grad, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_jvp_two_tensors_has_aux, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_linearize_jvp_fn, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vjp, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vjp_call_compiled_backward_fn, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vjp_has_aux, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vjp_multiple_outputs, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vjp_multiple_outputs_python_struct, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_call_compiled_backward_fn, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_call_torch_compile_fn, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_free_const, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_free_tensor, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_get_wrapped, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_kwargs, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_multiple_invocation_in_dims, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_multiple_invocation_out_dims, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_multiple_outputs, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_multiple_outputs_diff_dims, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_multiple_outputs_out_dims_tuple, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_new_tensor_implicit_via_op, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_new_tensor_in_body, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_new_tensor_unused_in_body, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_out_dims_None, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_over_vmap_captured, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_over_vmap_two_inputs, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_previous_illegal_op_no_graph_break, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_pytree_inputs, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_recompile, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_recompile_different_config, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_recompile_same_config, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_recompile_with_randomness, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_side_effects, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_side_effects_append_input, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_two_inputs, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_two_inputs_tuple_in_dims, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_with_conditional_graph_break, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_with_graph_break, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_with_graph_break_2, test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_vmap_with_graph_break_lambda, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_cond_with_invalid_kwargs, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_cond_with_kwargs, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_cond_with_mismatched_output, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_dropout, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_dropout_inductor, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_fallback, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_flop_counter_for_cond, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_flop_counter_for_cond_unbalanced_branches, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_flop_counter_for_nested_cond, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_function, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_function_with_kwargs, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_module, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_non_aliasing_util, test/dynamo/test_higher_order_ops.py::ActivationCheckpointingTests::test_override_fallthrough_dispatch_key, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_aot_eager_auto_functionalize_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_aot_eager_cond_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_aot_eager_invoke_quant_packed_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_aot_eager_invoke_quant_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_aot_eager_invoke_subgraph_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_aot_eager_while_loop_stack_output_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_inductor_auto_functionalize_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_inductor_cond_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_inductor_invoke_quant_packed_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_inductor_invoke_quant_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_inductor_invoke_subgraph_simple_cuda_float32, test/dynamo/test_higher_order_ops.py::TestHigherOrderOpsOpInfoCUDA::test_hops_compile_backend_inductor_while_loop_stack_output_simple_cuda_float32 2025-09-07T06:56:14.8987151Z 2025-09-07T06:56:14.8987244Z Running dynamo/test_input_attr_tracking 1/1 ... [2025-09-07 06:56:14.892389] 2025-09-07T06:56:14.8987421Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:56:14.8987823Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_input_attr_tracking.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:56:14.892677] 2025-09-07T06:56:23.1237203Z 2025-09-07T06:56:23.1238433Z dynamo/test_input_attr_tracking 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_input_attr_tracking_1.1_4d3def7c4d6d849f_.log 2025-09-07T06:56:23.1243064Z Running 12 items in this shard: test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_complex_attr_access_with_graph_breaks, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_complex_attr_access_with_inline_reconstruct, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_complex_attr_access_without_graph_breaks, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_const_property_assigned_on_tensor, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_const_property_on_tensor, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_guards_correctly_property_assigned_on_tensor_type_change, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_guards_correctly_property_assigned_on_tensor_type_change_inductor, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_set_data_on_input_tensor, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_set_data_on_scoped_tensor, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_set_data_on_user_defined_class_input_tensor, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_tensor_property_assigned_on_tensor, test/dynamo/test_input_attr_tracking.py::TestInputAttrTracking::test_tensor_property_on_tensor 2025-09-07T06:56:23.1247598Z 2025-09-07T06:56:23.1247785Z Running dynamo/test_install_free_tensors 1/1 ... [2025-09-07 06:56:23.123627] 2025-09-07T06:56:23.1248127Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:56:23.1248893Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_install_free_tensors.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:56:23.123967] 2025-09-07T06:56:59.2706475Z 2025-09-07T06:56:59.2711414Z dynamo/test_install_free_tensors 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_install_free_tensors_1.1_318733cacb2d6744_.log 2025-09-07T06:56:59.2719451Z Running 25 items in this shard: test/dynamo/test_install_free_tensors.py::InstallParamsAsGraphAttrTests::test_breadth_linear, test/dynamo/test_install_free_tensors.py::InstallParamsAsGraphAttrTests::test_nested_linear, test/dynamo/test_install_free_tensors.py::InstallParamsAsGraphAttrTests::test_nets_as_input, test/dynamo/test_install_free_tensors.py::InstallParamsAsGraphAttrTests::test_optimizing_buffer_and_param_in_input, test/dynamo/test_install_free_tensors.py::InstallParamsAsGraphAttrTests::test_optimizing_buffer_in_input, test/dynamo/test_install_free_tensors.py::InstallParamsAsGraphAttrTests::test_optimizing_linear, test/dynamo/test_install_free_tensors.py::InstallParamsAsGraphAttrTests::test_optimizing_params_in_input, test/dynamo/test_install_free_tensors.py::InstallParamsAsGraphAttrTests::test_resnet_structure, test/dynamo/test_install_free_tensors.py::InstallParamsAsGraphAttrTests::test_simple_batchnorm, test/dynamo/test_install_free_tensors.py::InstallParamsAsGraphAttrTests::test_transformer, test/dynamo/test_install_free_tensors.py::InstallParamsWhenExport::test_dict_of_tensor, test/dynamo/test_install_free_tensors.py::InstallParamsWhenExport::test_global_tensor_export, test/dynamo/test_install_free_tensors.py::InstallParamsWhenExport::test_list_of_tensor, test/dynamo/test_install_free_tensors.py::InstallParamsWhenExport::test_modify_net_state, test/dynamo/test_install_free_tensors.py::InstallParamsWhenExport::test_nested_list_of_tensor, test/dynamo/test_install_free_tensors.py::InstallParamsWhenExport::test_nonlocal_closure, test/dynamo/test_install_free_tensors.py::InstallParamsWhenExport::test_optimizing_buffer_and_param_in_input, test/dynamo/test_install_free_tensors.py::InstallParamsWhenExport::test_optimizing_buffer_in_input, test/dynamo/test_install_free_tensors.py::InstallParamsWhenExport::test_optimizing_params_in_input, test/dynamo/test_install_free_tensors.py::InstallParamsWhenExport::test_resnet_structure, test/dynamo/test_install_free_tensors.py::InstallParamsWhenExport::test_simple_batchnorm, test/dynamo/test_install_free_tensors.py::InstallParamsWhenExport::test_simple_linear, test/dynamo/test_install_free_tensors.py::InstallParamsWhenExport::test_tensors_as_nn_attr, test/dynamo/test_install_free_tensors.py::InstallParamsWhenExport::test_transformer, test/dynamo/test_install_free_tensors.py::InstallParamsWhenExport::test_user_defined_object 2025-09-07T06:56:59.2729837Z 2025-09-07T06:56:59.2729955Z Running dynamo/test_nested_graph_breaks 1/1 ... [2025-09-07 06:56:59.270616] 2025-09-07T06:56:59.2730148Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:56:59.2730565Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_nested_graph_breaks.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:56:59.270911] 2025-09-07T06:57:07.3007400Z 2025-09-07T06:57:07.3008481Z dynamo/test_nested_graph_breaks 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_nested_graph_breaks_1.1_c96643a9dc70a451_.log 2025-09-07T06:57:07.3014277Z Running 14 items in this shard: test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_cells, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_differing_arg_nums, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_differing_locals_nums, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_doubly_nested_graph_break, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_inactive_ctx_manager, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_nested_graph_break_in_loop, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_nested_graph_break_in_try_block, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_no_recompiles, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_side_effects_cells, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_side_effects_globals, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_side_effects_globals_different_module, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_single_graph_break, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_single_graph_break_repeat, test/dynamo/test_nested_graph_breaks.py::NestedGraphBreakTests::test_supported_ctx_manager 2025-09-07T06:57:07.3016299Z 2025-09-07T06:57:07.3022092Z Running dynamo/test_nops 1/1 ... [2025-09-07 06:57:07.300633] 2025-09-07T06:57:07.3022281Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:57:07.3022930Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_nops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:57:07.300840] 2025-09-07T06:57:09.6700873Z 2025-09-07T06:57:09.6701705Z dynamo/test_nops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_nops_1.1_5e5f4566a7b0e2d6_.log 2025-09-07T06:57:09.6702642Z Running 4 items in this shard: test/dynamo/test_nops.py::NopTests::test1, test/dynamo/test_nops.py::NopTests::test2, test/dynamo/test_nops.py::NopTests::test3, test/dynamo/test_nops.py::NopTests::test_extended_args 2025-09-07T06:57:09.6703123Z 2025-09-07T06:57:09.6705237Z Running dynamo/test_optimizers 1/1 ... [2025-09-07 06:57:09.670180] 2025-09-07T06:57:09.6705562Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:57:09.6709627Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_optimizers.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:57:09.670419] 2025-09-07T06:57:13.0408247Z 2025-09-07T06:57:13.0409377Z dynamo/test_optimizers 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_optimizers_1.1_1730cb6dd81b5cfb_.log 2025-09-07T06:57:13.0411069Z Running 3 items in this shard: test/dynamo/test_optimizers.py::End2EndTests::test_init_group, test/dynamo/test_optimizers.py::End2EndTests::test_optimizing_over_tensor_with_requires_grad, test/dynamo/test_optimizers.py::End2EndTests::test_state_dict 2025-09-07T06:57:13.0412771Z 2025-09-07T06:57:13.0413000Z Running dynamo/test_pgo 1/1 ... [2025-09-07 06:57:13.040569] 2025-09-07T06:57:13.0413429Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:57:13.0414490Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_pgo.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:57:13.040777] 2025-09-07T06:57:28.1842935Z 2025-09-07T06:57:28.1844012Z dynamo/test_pgo 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_pgo_1.1_cf978faca03989b0_.log 2025-09-07T06:57:28.1847535Z Running 12 items in this shard: test/dynamo/test_pgo.py::PgoTest::test_basic, test/dynamo/test_pgo.py::PgoTest::test_different_file_paths_local_pgo, test/dynamo/test_pgo.py::PgoTest::test_distinct_compile_id, test/dynamo/test_pgo.py::PgoTest::test_njt, test/dynamo/test_pgo.py::PgoTest::test_no_empty_graph_allowlist, test/dynamo/test_pgo.py::PgoTest::test_pgo_dynamic_false, test/dynamo/test_pgo.py::PgoTest::test_pgo_dynamic_params, test/dynamo/test_pgo.py::PgoTest::test_profile_merges, test/dynamo/test_pgo.py::PgoTest::test_remote_basic, test/dynamo/test_pgo.py::PgoTest::test_sticky_pgo_read_write, test/dynamo/test_pgo.py::PgoTest::test_whitelist_ints_floats, test/dynamo/test_pgo.py::PgoTest::test_whitelist_suggestion 2025-09-07T06:57:28.1850396Z 2025-09-07T06:57:28.1850649Z Running dynamo/test_pre_dispatch 1/1 ... [2025-09-07 06:57:28.184233] 2025-09-07T06:57:28.1851125Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:57:28.1854902Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_pre_dispatch.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:57:28.184494] 2025-09-07T06:57:30.5539137Z 2025-09-07T06:57:30.5540412Z dynamo/test_pre_dispatch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_pre_dispatch_1.1_2327b0126874e7cc_.log 2025-09-07T06:57:30.5542160Z Running 3 items in this shard: test/dynamo/test_pre_dispatch.py::PreDispatchTests::test_autocast_simple, test/dynamo/test_pre_dispatch.py::PreDispatchTests::test_enable_grad_and_no_grad, test/dynamo/test_pre_dispatch.py::PreDispatchTests::test_no_grad_simple 2025-09-07T06:57:30.5544125Z 2025-09-07T06:57:30.5544418Z Running dynamo/test_precompile_context 1/1 ... [2025-09-07 06:57:30.554055] 2025-09-07T06:57:30.5553444Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:57:30.5554407Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_precompile_context.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:57:30.554271] 2025-09-07T06:57:40.5413006Z 2025-09-07T06:57:40.5414195Z dynamo/test_precompile_context 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_precompile_context_1.1_34d6e56f54f04a5e_.log 2025-09-07T06:57:40.5416132Z Running 3 items in this shard: test/dynamo/test_precompile_context.py::PrecompileContextTests::test_basic, test/dynamo/test_precompile_context.py::PrecompileContextTests::test_editable, test/dynamo/test_precompile_context.py::PrecompileContextTests::test_serialize_by_key 2025-09-07T06:57:40.5417426Z 2025-09-07T06:57:40.5417674Z Running dynamo/test_profiler 1/1 ... [2025-09-07 06:57:40.541306] 2025-09-07T06:57:40.5418119Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:57:40.5419457Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_profiler.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:57:40.541582] 2025-09-07T06:57:50.9750322Z 2025-09-07T06:57:50.9751596Z dynamo/test_profiler 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_profiler_1.1_3062faaa04b73382_.log 2025-09-07T06:57:50.9754149Z Running 11 items in this shard: test/dynamo/test_profiler.py::DynamoProfilerTests::test_dynamo_timed_profiling_backend_compile, test/dynamo/test_profiler.py::DynamoProfilerTests::test_dynamo_timed_profiling_isolated, test/dynamo/test_profiler.py::DynamoProfilerTests::test_execution_trace_dynamic_shapes, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profile_dynamic_shapes_compilation, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profile_dynamic_shapes_list_compilation, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profile_dynamic_shapes_runtime, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profiler_cache_lookup, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profiler_cache_lookup_profiler_step, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profiler_dynamo_compiled_region, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profiler_enabled, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profiler_record_function_ignore 2025-09-07T06:57:50.9756239Z 2025-09-07T06:57:50.9756385Z Running dynamo/test_python_autograd 1/1 ... [2025-09-07 06:57:50.974932] 2025-09-07T06:57:50.9756825Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:57:50.9763679Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_python_autograd.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:57:50.975184] 2025-09-07T06:57:53.4947169Z 2025-09-07T06:57:53.4949908Z dynamo/test_python_autograd 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_python_autograd_1.1_b0dddf9aaaeea22d_.log 2025-09-07T06:57:53.4952193Z Running 5 items in this shard: test/dynamo/test_python_autograd.py::TestPythonAutograd::test_backwards1, test/dynamo/test_python_autograd.py::TestPythonAutograd::test_backwards2, test/dynamo/test_python_autograd.py::TestPythonAutograd::test_forwards1, test/dynamo/test_python_autograd.py::TestPythonAutograd::test_forwards2, test/dynamo/test_python_autograd.py::TestPythonAutograd::test_split 2025-09-07T06:57:53.4953760Z 2025-09-07T06:57:53.4954023Z Running dynamo/test_python_dispatcher 1/1 ... [2025-09-07 06:57:53.494581] 2025-09-07T06:57:53.4955182Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:57:53.4956281Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_python_dispatcher.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:57:53.494895] 2025-09-07T06:57:56.2654743Z 2025-09-07T06:57:56.2656036Z dynamo/test_python_dispatcher 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_python_dispatcher_1.1_e4a5e1a248e57e7a_.log 2025-09-07T06:57:56.2664312Z Running 6 items in this shard: test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key1, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key2, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key3, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key4, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key_set_guard, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_functorch_interpreter 2025-09-07T06:57:56.2665107Z 2025-09-07T06:57:56.2665199Z Running dynamo/test_recompile_ux 1/1 ... [2025-09-07 06:57:56.265374] 2025-09-07T06:57:56.2665366Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:57:56.2665766Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_recompile_ux.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:57:56.265665] 2025-09-07T06:58:02.4938981Z 2025-09-07T06:58:02.4940569Z dynamo/test_recompile_ux 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_recompile_ux_1.1_a9a655f6e9263458_.log 2025-09-07T06:58:02.4944294Z Running 10 items in this shard: test/dynamo/test_recompile_ux.py::RecompileUxTests::test_drop_cache_on_skip, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_dynamic_input, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_fail_on_recompile_limit_hit, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_loop_torture, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_mismatched_type, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_multiple_guard_fails, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_multiple_guard_fails_report_all, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_nvfuser_guards, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_recompile_child_run_only, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_verbose_tensor_check 2025-09-07T06:58:02.4947523Z 2025-09-07T06:58:02.4947779Z Running dynamo/test_reconstruct 1/1 ... [2025-09-07 06:58:02.493838] 2025-09-07T06:58:02.4954295Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:58:02.4954697Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_reconstruct.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:58:02.494166] 2025-09-07T06:58:08.6244572Z 2025-09-07T06:58:08.6245235Z dynamo/test_reconstruct 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_reconstruct_1.1_5b34f875cc4e9c96_.log 2025-09-07T06:58:08.6250023Z Running 16 items in this shard: test/dynamo/test_reconstruct.py::ReconstructTest::test_ConstDict_clear_reconstruct, test/dynamo/test_reconstruct.py::ReconstructTest::test_ConstDict_del_reconstruct, test/dynamo/test_reconstruct.py::ReconstructTest::test_ConstDict_get_reconstruct, test/dynamo/test_reconstruct.py::ReconstructTest::test_ConstDict_optimize_reconstruct, test/dynamo/test_reconstruct.py::ReconstructTest::test_ConstDict_pop_reconstruct, test/dynamo/test_reconstruct.py::ReconstructTest::test_ConstDict_popitem_reconstruct, test/dynamo/test_reconstruct.py::ReconstructTest::test_ConstDict_popitem_reconstruct_graph_break, test/dynamo/test_reconstruct.py::ReconstructTest::test_create_dict_reconstruct, test/dynamo/test_reconstruct.py::ReconstructTest::test_functional_call_reconstruct, test/dynamo/test_reconstruct.py::ReconstructTest::test_functional_call_reconstruct_2, test/dynamo/test_reconstruct.py::ReconstructTest::test_graph_break_in_wrapped_nested_function, test/dynamo/test_reconstruct.py::ReconstructTest::test_graph_break_in_wrapped_skipped_function, test/dynamo/test_reconstruct.py::ReconstructTest::test_graph_break_in_wrapped_user_function, test/dynamo/test_reconstruct.py::ReconstructTest::test_graph_break_in_wrapped_user_method, test/dynamo/test_reconstruct.py::ReconstructTest::test_tma_experimental_reconstruct, test/dynamo/test_reconstruct.py::ReconstructTest::test_tma_stable_reconstruct 2025-09-07T06:58:08.6252298Z 2025-09-07T06:58:08.6252436Z Running dynamo/test_reorder_logs 1/1 ... [2025-09-07 06:58:08.624241] 2025-09-07T06:58:08.6252707Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:58:08.6253125Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_reorder_logs.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:58:08.624938] 2025-09-07T06:58:11.4451672Z 2025-09-07T06:58:11.4453628Z dynamo/test_reorder_logs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_reorder_logs_1.1_112905d7103e73d4_.log 2025-09-07T06:58:11.4460675Z Running 14 items in this shard: test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method0_fn0_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method1_fn1_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method2_fn2_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method3_fn3_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method4_fn4_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method5_fn5_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method6_fn6_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method7_fn7_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_constant_mutation, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_dont_reorder_print, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_custom_log_fn, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_print, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_print_graph_break, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_warnings 2025-09-07T06:58:11.4464004Z 2025-09-07T06:58:11.4464135Z Running dynamo/test_repros 1/1 ... [2025-09-07 06:58:11.444829] 2025-09-07T06:58:11.4464388Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:58:11.4464991Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_repros.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:58:11.445053] 2025-09-07T06:59:42.7608538Z 2025-09-07T06:59:42.7609075Z PRINTING LOG FILE of dynamo/test_repros 1/1 (test/test-reports/dynamo.test_repros_1.1_b10c530c279eac19_.log) 2025-09-07T06:59:42.7610068Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T06:59:42.7618119Z import pkg_resources 2025-09-07T06:59:42.7618800Z Test results will be stored in test-reports/python-pytest/dynamo.test_repros/dynamo.test_repros-c96249d0993a3f5e.xml 2025-09-07T06:59:42.7619142Z ============================= test session starts ============================== 2025-09-07T06:59:42.7619443Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T06:59:42.7619677Z cachedir: .pytest_cache 2025-09-07T06:59:42.7619983Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T06:59:42.7620285Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T06:59:42.7620428Z configfile: pytest.ini 2025-09-07T06:59:42.7620707Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T06:59:42.7621007Z collecting ... collected 333 items 2025-09-07T06:59:42.7621188Z stepcurrent: Cannot find last run test, not skipping 2025-09-07T06:59:42.7653587Z Running 333 items in this shard: test/dynamo/test_repros.py::LRUCacheWarningTests::test_lru_cache_warning_issued_during_tracing, test/dynamo/test_repros.py::ReproTests::test_312_local_cell_overlap, test/dynamo/test_repros.py::ReproTests::test_Size, test/dynamo/test_repros.py::ReproTests::test_abc_setattr, test/dynamo/test_repros.py::ReproTests::test_add_complex_conj, test/dynamo/test_repros.py::ReproTests::test_add_sub_alpha_out, test/dynamo/test_repros.py::ReproTests::test_addr_alpha_beta_out, test/dynamo/test_repros.py::ReproTests::test_amp_foreach_fake_impl, test/dynamo/test_repros.py::ReproTests::test_aot_autograd_runtime_wrapper_prologue_profiled, test/dynamo/test_repros.py::ReproTests::test_as_strided_on_base_with_mutation_works, test/dynamo/test_repros.py::ReproTests::test_as_strided_on_existing_view_banned, test/dynamo/test_repros.py::ReproTests::test_attached_attribute_in_dir, test/dynamo/test_repros.py::ReproTests::test_autograd_function_graph_break, test/dynamo/test_repros.py::ReproTests::test_avoid_dupe_specialization, test/dynamo/test_repros.py::ReproTests::test_batch_encoding_clone_inputs, test/dynamo/test_repros.py::ReproTests::test_batch_norm_act, test/dynamo/test_repros.py::ReproTests::test_batchnorm_e2e, test/dynamo/test_repros.py::ReproTests::test_bigbird_unsqueeze_inplace, test/dynamo/test_repros.py::ReproTests::test_bitwise_op_guard, test/dynamo/test_repros.py::ReproTests::test_bitwise_print_precedence, test/dynamo/test_repros.py::ReproTests::test_boxes_len, test/dynamo/test_repros.py::ReproTests::test_build_map_unpack_with_call, test/dynamo/test_repros.py::ReproTests::test_c_defined_metaclass, test/dynamo/test_repros.py::ReproTests::test_changing_stride, test/dynamo/test_repros.py::ReproTests::test_chunk_reformer_ff, test/dynamo/test_repros.py::ReproTests::test_class_member, test/dynamo/test_repros.py::ReproTests::test_classmethod_with_slots, test/dynamo/test_repros.py::ReproTests::test_compilation_metrics_on_error, test/dynamo/test_repros.py::ReproTests::test_compile_complex_conj, test/dynamo/test_repros.py::ReproTests::test_compile_copy__int_overload, test/dynamo/test_repros.py::ReproTests::test_const_dict_keyerror, test/dynamo/test_repros.py::ReproTests::test_contains_range_constprop, test/dynamo/test_repros.py::ReproTests::test_convert_boxes_to_pooler_format, test/dynamo/test_repros.py::ReproTests::test_copy_weird_strides, test/dynamo/test_repros.py::ReproTests::test_create_rand_mask_from_inputs, test/dynamo/test_repros.py::ReproTests::test_dalle2_maybe, test/dynamo/test_repros.py::ReproTests::test_data_attr_mutation_after_saved_for_bw, test/dynamo/test_repros.py::ReproTests::test_dataclass_in_module, test/dynamo/test_repros.py::ReproTests::test_dataclass_init_with_default_factory_with_inputs, test/dynamo/test_repros.py::ReproTests::test_ddp_checkpoint, test/dynamo/test_repros.py::ReproTests::test_dedup_global, test/dynamo/test_repros.py::ReproTests::test_deferred_runtime_asserts, test/dynamo/test_repros.py::ReproTests::test_delattr, test/dynamo/test_repros.py::ReproTests::test_delattr_raises, test/dynamo/test_repros.py::ReproTests::test_delattr_return, test/dynamo/test_repros.py::ReproTests::test_delete_local_error, test/dynamo/test_repros.py::ReproTests::test_deleted_compile_wrapper_segfault, test/dynamo/test_repros.py::ReproTests::test_delsubscr, test/dynamo/test_repros.py::ReproTests::test_delsubscr_raises, test/dynamo/test_repros.py::ReproTests::test_detectron2_instances_cat, test/dynamo/test_repros.py::ReproTests::test_disabling_unpack_hooks_within_compiled_region, test/dynamo/test_repros.py::ReproTests::test_distributions_subclass, test/dynamo/test_repros.py::ReproTests::test_do_paste_mask, test/dynamo/test_repros.py::ReproTests::test_dont_aggressively_write_assert, test/dynamo/test_repros.py::ReproTests::test_dont_dce_rand, test/dynamo/test_repros.py::ReproTests::test_dropout_inline, test/dynamo/test_repros.py::ReproTests::test_dynamic_shape_disable_duck_size, test/dynamo/test_repros.py::ReproTests::test_dynamic_shapes_double_not_equal, test/dynamo/test_repros.py::ReproTests::test_dynamic_shapes_float_guard, test/dynamo/test_repros.py::ReproTests::test_dynamic_shapes_implicit_guard, test/dynamo/test_repros.py::ReproTests::test_dynamic_shapes_right_side, test/dynamo/test_repros.py::ReproTests::test_ellipsis, test/dynamo/test_repros.py::ReproTests::test_embedding_backward_broadcasting_decomp, test/dynamo/test_repros.py::ReproTests::test_empty_graph_nested_calls_fullgraph_False, test/dynamo/test_repros.py::ReproTests::test_empty_graph_nested_calls_fullgraph_True, test/dynamo/test_repros.py::ReproTests::test_empty_list_contains_with_jump, test/dynamo/test_repros.py::ReproTests::test_empty_out_dynamic, test/dynamo/test_repros.py::ReproTests::test_enum, test/dynamo/test_repros.py::ReproTests::test_ephemeral_module, test/dynamo/test_repros.py::ReproTests::test_error_return_without_exception_set, test/dynamo/test_repros.py::ReproTests::test_exception_in_dynamo_handling, test/dynamo/test_repros.py::ReproTests::test_exec_import, test/dynamo/test_repros.py::ReproTests::test_exec_wildcard_import, test/dynamo/test_repros.py::ReproTests::test_flip_bad_accuracy, test/dynamo/test_repros.py::ReproTests::test_for_loop_graph_break, test/dynamo/test_repros.py::ReproTests::test_for_loop_graph_break_before, test/dynamo/test_repros.py::ReproTests::test_foreach_decomp_arg_names, test/dynamo/test_repros.py::ReproTests::test_fsdp_set_input_mutation_applied_when_input_gets_no_gradients, test/dynamo/test_repros.py::ReproTests::test_function_in_skipfiles, test/dynamo/test_repros.py::ReproTests::test_functools_wraps, test/dynamo/test_repros.py::ReproTests::test_gan_repro_trying_to_backward_through_the_graph_a_second_time, test/dynamo/test_repros.py::ReproTests::test_generator_dealloc, test/dynamo/test_repros.py::ReproTests::test_get_parameter_dtype, test/dynamo/test_repros.py::ReproTests::test_get_type_hints, test/dynamo/test_repros.py::ReproTests::test_global_fn_mutation, test/dynamo/test_repros.py::ReproTests::test_grad, test/dynamo/test_repros.py::ReproTests::test_grad_mode_carrying_correct_state_after_graph_break, test/dynamo/test_repros.py::ReproTests::test_grad_references_cleared, test/dynamo/test_repros.py::ReproTests::test_graph_break_on_jit_isinstance, test/dynamo/test_repros.py::ReproTests::test_graph_break_on_jit_isinstance_pep585, test/dynamo/test_repros.py::ReproTests::test_graph_break_unsupported_fake, test/dynamo/test_repros.py::ReproTests::test_guard_default_device, test/dynamo/test_repros.py::ReproTests::test_guard_fail_nested_tuple, test/dynamo/test_repros.py::ReproTests::test_guard_fail_tensor_bool, test/dynamo/test_repros.py::ReproTests::test_guard_ordering_shape_fail, test/dynamo/test_repros.py::ReproTests::test_guard_with_tuple_mutation, test/dynamo/test_repros.py::ReproTests::test_hasattr_builtin, test/dynamo/test_repros.py::ReproTests::test_hf_bigbird_unsqueeze, test/dynamo/test_repros.py::ReproTests::test_hf_classinstantier, test/dynamo/test_repros.py::ReproTests::test_hf_gelu_inline, test/dynamo/test_repros.py::ReproTests::test_hf_model_output, test/dynamo/test_repros.py::ReproTests::test_hf_t5_forward, test/dynamo/test_repros.py::ReproTests::test_hf_xsoftmax_inference, test/dynamo/test_repros.py::ReproTests::test_hf_xsoftmax_training, test/dynamo/test_repros.py::ReproTests::test_iadd_graph_break, test/dynamo/test_repros.py::ReproTests::test_incompatible_configs, test/dynamo/test_repros.py::ReproTests::test_indexing_with_list, test/dynamo/test_repros.py::ReproTests::test_inductor_dynamic_shapes_broadcasting, test/dynamo/test_repros.py::ReproTests::test_inductor_no_recursionerror_on_for_loops, test/dynamo/test_repros.py::ReproTests::test_inductor_rng_default_dtype, test/dynamo/test_repros.py::ReproTests::test_inference_mode_dynamic_shapes, test/dynamo/test_repros.py::ReproTests::test_inlining_cornercase, test/dynamo/test_repros.py::ReproTests::test_inplace_unsqueeze_input, test/dynamo/test_repros.py::ReproTests::test_int_format, test/dynamo/test_repros.py::ReproTests::test_intermediate_leaf_requires_grad, test/dynamo/test_repros.py::ReproTests::test_invalid_seq_unpack, test/dynamo/test_repros.py::ReproTests::test_is_make_fx_tracing, test/dynamo/test_repros.py::ReproTests::test_is_symbolic_tracing, test/dynamo/test_repros.py::ReproTests::test_isinstance_dtype, test/dynamo/test_repros.py::ReproTests::test_isinstance_storage, test/dynamo/test_repros.py::ReproTests::test_issue111522, test/dynamo/test_repros.py::ReproTests::test_issue111918, test/dynamo/test_repros.py::ReproTests::test_issue114171, test/dynamo/test_repros.py::ReproTests::test_issue126128, test/dynamo/test_repros.py::ReproTests::test_issue134451, test/dynamo/test_repros.py::ReproTests::test_issue1466_size_aot_autograd, test/dynamo/test_repros.py::ReproTests::test_issue175, test/dynamo/test_repros.py::ReproTests::test_jit_script_defaults, test/dynamo/test_repros.py::ReproTests::test_jit_trace_errors, test/dynamo/test_repros.py::ReproTests::test_kwargs_out_list_variable, test/dynamo/test_repros.py::ReproTests::test_list_aliasing, test/dynamo/test_repros.py::ReproTests::test_list_index, test/dynamo/test_repros.py::ReproTests::test_list_index_not_found, test/dynamo/test_repros.py::ReproTests::test_list_index_tensor_unsupported, test/dynamo/test_repros.py::ReproTests::test_list_reverse, test/dynamo/test_repros.py::ReproTests::test_list_self_reference, test/dynamo/test_repros.py::ReproTests::test_listcomp, test/dynamo/test_repros.py::ReproTests::test_longformer_chunk, test/dynamo/test_repros.py::ReproTests::test_longtensor_list, test/dynamo/test_repros.py::ReproTests::test_lru_cache_tracing, test/dynamo/test_repros.py::ReproTests::test_maml_item_capture, test/dynamo/test_repros.py::ReproTests::test_maml_no_item_capture, test/dynamo/test_repros.py::ReproTests::test_many_overlapping_inputs_does_not_explode_guards, test/dynamo/test_repros.py::ReproTests::test_many_views_with_mutation, test/dynamo/test_repros.py::ReproTests::test_map_with_multiple_args, test/dynamo/test_repros.py::ReproTests::test_maybe_multiply_symint, test/dynamo/test_repros.py::ReproTests::test_mem_leak_guards, test/dynamo/test_repros.py::ReproTests::test_merge_criteria_processor_list1, test/dynamo/test_repros.py::ReproTests::test_merge_criteria_processor_list2, test/dynamo/test_repros.py::ReproTests::test_method_overriding, test/dynamo/test_repros.py::ReproTests::test_module_in_skipfiles, test/dynamo/test_repros.py::ReproTests::test_modules, test/dynamo/test_repros.py::ReproTests::test_multi_dot_import, test/dynamo/test_repros.py::ReproTests::test_multi_import, test/dynamo/test_repros.py::ReproTests::test_named_buffers, test/dynamo/test_repros.py::ReproTests::test_nanmean_out, test/dynamo/test_repros.py::ReproTests::test_negative_floor_div_solve, test/dynamo/test_repros.py::ReproTests::test_negative_shape_guard, test/dynamo/test_repros.py::ReproTests::test_nested_while_loop_graph_break, test/dynamo/test_repros.py::ReproTests::test_nn_module_callable, test/dynamo/test_repros.py::ReproTests::test_nn_module_property_closure, test/dynamo/test_repros.py::ReproTests::test_nn_module_stack_bc, test/dynamo/test_repros.py::ReproTests::test_nn_param_freevar_codegen, test/dynamo/test_repros.py::ReproTests::test_nn_parameter, test/dynamo/test_repros.py::ReproTests::test_nn_parameter_ctor_graph_breaks, test/dynamo/test_repros.py::ReproTests::test_nn_parametrize, test/dynamo/test_repros.py::ReproTests::test_no_grad_inline, test/dynamo/test_repros.py::ReproTests::test_no_tracing_into_eval_frame, test/dynamo/test_repros.py::ReproTests::test_no_tracing_into_eval_frame_ctx_manager, test/dynamo/test_repros.py::ReproTests::test_nonconst_issubclass, test/dynamo/test_repros.py::ReproTests::test_not_rewrite_assert_for_other_errors, test/dynamo/test_repros.py::ReproTests::test_nullcontext1, test/dynamo/test_repros.py::ReproTests::test_nullcontext2, test/dynamo/test_repros.py::ReproTests::test_numpy_not_ndarray_recompiles, test/dynamo/test_repros.py::ReproTests::test_numpy_tobytes_no_error, test/dynamo/test_repros.py::ReproTests::test_odict_get_item_index_name, test/dynamo/test_repros.py::ReproTests::test_omegaconf_dictconfig, test/dynamo/test_repros.py::ReproTests::test_omegaconf_listconfig_contains, test/dynamo/test_repros.py::ReproTests::test_omegaconf_listconfig_iter, test/dynamo/test_repros.py::ReproTests::test_ones_out_dynamic, test/dynamo/test_repros.py::ReproTests::test_optim_state_references_cleared, test/dynamo/test_repros.py::ReproTests::test_optimized_deepcopy, test/dynamo/test_repros.py::ReproTests::test_optimized_module_patched_init, test/dynamo/test_repros.py::ReproTests::test_optimized_module_training, test/dynamo/test_repros.py::ReproTests::test_os_fspath, test/dynamo/test_repros.py::ReproTests::test_out_nested_cell_shape_change, test/dynamo/test_repros.py::ReproTests::test_out_nested_cell_tuple_shape_change, test/dynamo/test_repros.py::ReproTests::test_out_none, test/dynamo/test_repros.py::ReproTests::test_out_overload_non_contiguous, test/dynamo/test_repros.py::ReproTests::test_out_root_cell_shape_change, test/dynamo/test_repros.py::ReproTests::test_out_root_cell_tuple_shape_change, test/dynamo/test_repros.py::ReproTests::test_output_aliases_intermediate, test/dynamo/test_repros.py::ReproTests::test_overlapping_inputs_with_dynamic_shapes_error, test/dynamo/test_repros.py::ReproTests::test_overwriting_params, test/dynamo/test_repros.py::ReproTests::test_partially_initialized_module_property, test/dynamo/test_repros.py::ReproTests::test_partitioner_activation_memory_budget_with_unbacked_symints, test/dynamo/test_repros.py::ReproTests::test_partitioner_cse_respects_mutation_boundaries, test/dynamo/test_repros.py::ReproTests::test_pointless_graph_removal, test/dynamo/test_repros.py::ReproTests::test_primtorch, test/dynamo/test_repros.py::ReproTests::test_primtorch_no_graph_break, test/dynamo/test_repros.py::ReproTests::test_randint_out_dynamic, test/dynamo/test_repros.py::ReproTests::test_recursive_map, test/dynamo/test_repros.py::ReproTests::test_reformer_eval, test/dynamo/test_repros.py::ReproTests::test_reformer_min_chunk_len, test/dynamo/test_repros.py::ReproTests::test_reformer_sorting, test/dynamo/test_repros.py::ReproTests::test_reformer_train, test/dynamo/test_repros.py::ReproTests::test_reinplacing, test/dynamo/test_repros.py::ReproTests::test_relative_import, test/dynamo/test_repros.py::ReproTests::test_relative_import_no_modulename, test/dynamo/test_repros.py::ReproTests::test_requires_grad_guards_with_grad_mode1, test/dynamo/test_repros.py::ReproTests::test_requires_grad_guards_with_grad_mode2, test/dynamo/test_repros.py::ReproTests::test_restricted_list_subclass1, test/dynamo/test_repros.py::ReproTests::test_restricted_list_subclass2, test/dynamo/test_repros.py::ReproTests::test_restricted_list_subclass3, test/dynamo/test_repros.py::ReproTests::test_return_value_duplication_mixed_grad, test/dynamo/test_repros.py::ReproTests::test_return_value_duplication_scalar, test/dynamo/test_repros.py::ReproTests::test_return_value_duplication_tensor, test/dynamo/test_repros.py::ReproTests::test_return_weakref, test/dynamo/test_repros.py::ReproTests::test_rewrite_assert_dont_change_bytecode, test/dynamo/test_repros.py::ReproTests::test_rewrite_assert_noop, test/dynamo/test_repros.py::ReproTests::test_rewrite_assert_with_msg, test/dynamo/test_repros.py::ReproTests::test_rewrite_assert_with_non_string_msg, test/dynamo/test_repros.py::ReproTests::test_rewrite_assert_without_msg, test/dynamo/test_repros.py::ReproTests::test_rng_state, test/dynamo/test_repros.py::ReproTests::test_seq_append_list, test/dynamo/test_repros.py::ReproTests::test_setattr_requires_grad_graph_breaks, test/dynamo/test_repros.py::ReproTests::test_setitem_boolean_mask_diff, test/dynamo/test_repros.py::ReproTests::test_setitem_tensor_prop, test/dynamo/test_repros.py::ReproTests::test_setitem_tuple_boolean_mask_diff, test/dynamo/test_repros.py::ReproTests::test_sigmoid_out, test/dynamo/test_repros.py::ReproTests::test_sigmoid_out2, test/dynamo/test_repros.py::ReproTests::test_size_typematch, test/dynamo/test_repros.py::ReproTests::test_slice_into_list_mutable, test/dynamo/test_repros.py::ReproTests::test_slicing_dynamic_shape, test/dynamo/test_repros.py::ReproTests::test_slicing_dynamic_shape_setitem, test/dynamo/test_repros.py::ReproTests::test_sort_out, test/dynamo/test_repros.py::ReproTests::test_sort_out2, test/dynamo/test_repros.py::ReproTests::test_specialized_stride, test/dynamo/test_repros.py::ReproTests::test_split_with_sizes_aot_autograd, test/dynamo/test_repros.py::ReproTests::test_staticmethod_allow_in_graph, test/dynamo/test_repros.py::ReproTests::test_stk_sdd_is_transposed, test/dynamo/test_repros.py::ReproTests::test_stop_iteration_reconstruct, test/dynamo/test_repros.py::ReproTests::test_str_isalnum, test/dynamo/test_repros.py::ReproTests::test_string_format, test/dynamo/test_repros.py::ReproTests::test_subclass_graph_output_repro, test/dynamo/test_repros.py::ReproTests::test_super_classmethod, test/dynamo/test_repros.py::ReproTests::test_super_classmethod_inheritance, test/dynamo/test_repros.py::ReproTests::test_super_diamond, test/dynamo/test_repros.py::ReproTests::test_super_in_staticmethod, test/dynamo/test_repros.py::ReproTests::test_super_staticmethod, test/dynamo/test_repros.py::ReproTests::test_swin_base_tensor_attr, test/dynamo/test_repros.py::ReproTests::test_symint_bitwise, test/dynamo/test_repros.py::ReproTests::test_symnode_is_not_op, test/dynamo/test_repros.py::ReproTests::test_symnode_is_op, test/dynamo/test_repros.py::ReproTests::test_sys_monitoring, test/dynamo/test_repros.py::ReproTests::test_tensor_data_kwarg, test/dynamo/test_repros.py::ReproTests::test_tensor_isinstance_tuple, test/dynamo/test_repros.py::ReproTests::test_tensor_item, test/dynamo/test_repros.py::ReproTests::test_tensor_random, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_aot_eager_func_name_func1, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_aot_eager_func_name_func2, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_aot_eager_func_name_func3, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_eager_func_name_func1, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_eager_func_name_func2, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_eager_func_name_func3, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_inductor_func_name_func1, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_inductor_func_name_func2, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_inductor_func_name_func3, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_mismatched_dtype, test/dynamo/test_repros.py::ReproTests::test_tensor_split, test/dynamo/test_repros.py::ReproTests::test_tensor_split_within_device_cm, test/dynamo/test_repros.py::ReproTests::test_tensor_uniform, test/dynamo/test_repros.py::ReproTests::test_threading_local, test/dynamo/test_repros.py::ReproTests::test_tokenization, test/dynamo/test_repros.py::ReproTests::test_torch_compile_in_compile_frame, test/dynamo/test_repros.py::ReproTests::test_torch_ops_aten, test/dynamo/test_repros.py::ReproTests::test_torch_tensor_ops, test/dynamo/test_repros.py::ReproTests::test_torch_tensor_ops_no_graph_break, test/dynamo/test_repros.py::ReproTests::test_torch_variable_type, test/dynamo/test_repros.py::ReproTests::test_torchname, test/dynamo/test_repros.py::ReproTests::test_trace_functional_tensor_with, test/dynamo/test_repros.py::ReproTests::test_tuple_enum_as_key_dict, test/dynamo/test_repros.py::ReproTests::test_typed_dict, test/dynamo/test_repros.py::ReproTests::test_typed_dict_total, test/dynamo/test_repros.py::ReproTests::test_udf_classes_reconstruction, test/dynamo/test_repros.py::ReproTests::test_unbacked_arange_in_bounds, test/dynamo/test_repros.py::ReproTests::test_unbind_copy_out, test/dynamo/test_repros.py::ReproTests::test_unpack_hooks_can_be_disabled, test/dynamo/test_repros.py::ReproTests::test_unpack_hooks_dont_run_during_tracing, test/dynamo/test_repros.py::ReproTests::test_unspecialized_nn_module_with_torch_variable_attribute, test/dynamo/test_repros.py::ReproTests::test_unsqueeze_mul_strides, test/dynamo/test_repros.py::ReproTests::test_user_ctor_ctx_manager, test/dynamo/test_repros.py::ReproTests::test_user_ctor_ctx_manager_custom_init, test/dynamo/test_repros.py::ReproTests::test_user_ctor_ctx_manager_custom_init_graph_break, test/dynamo/test_repros.py::ReproTests::test_user_defined_iter, test/dynamo/test_repros.py::ReproTests::test_user_defined_object_callable, test/dynamo/test_repros.py::ReproTests::test_validate_model_kwargs, test/dynamo/test_repros.py::ReproTests::test_vc_bumped_in_inference_graph, test/dynamo/test_repros.py::ReproTests::test_vdd_duplicate_error, test/dynamo/test_repros.py::ReproTests::test_view_dtype_overload, test/dynamo/test_repros.py::ReproTests::test_weakref, test/dynamo/test_repros.py::ReproTests::test_weakref_callback, test/dynamo/test_repros.py::ReproTests::test_weakref_construction, test/dynamo/test_repros.py::ReproTests::test_weakref_del, test/dynamo/test_repros.py::ReproTests::test_weakref_proxy, test/dynamo/test_repros.py::ReproTests::test_weakref_reconstruct, test/dynamo/test_repros.py::ReproTests::test_while_loop_graph_break, test/dynamo/test_repros.py::ReproTests::test_while_loop_graph_break_inside_call_function, test/dynamo/test_repros.py::ReproTests::test_with_on_graph_break_inst, test/dynamo/test_repros.py::ReproTests::test_with_on_graph_break_nested, test/dynamo/test_repros.py::ReproTests::test_zeros_out_dynamic, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_cuda_sync_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_data_dependent_error_log_no_print_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_deepcopy_constant_tensor_in_aot_bwd_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_filter_safe_grad_warning_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_filter_user_warnings_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_filter_warnings_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_flash_attn_backward_mixed_strides_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_getattr_return_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_guard_default_device_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_megablocks_moe_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_memleak_when_graph_input_has_tensor_attr_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_module_attribute_error_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_named_tuple_vt_clone_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_norm_dtype_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_partitioner_saves_weights_for_bw_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_sdpa_dynamic_shapes_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_sub_alpha_scalar_repro_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_tensor_size_hasattr_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_torch_cuda_is_initialized_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_truthiness_of_symints_no_recompiles_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_udf_class_source_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_zero_dim_param_mixed_device_grad_cuda 2025-09-07T06:59:42.7684164Z 2025-09-07T06:59:42.7684467Z dynamo/test_repros.py::LRUCacheWarningTests::test_lru_cache_warning_issued_during_tracing <- ../../../../opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/logging_utils.py PASSED [0.4551s] [ 0%] 2025-09-07T06:59:42.7684885Z dynamo/test_repros.py::ReproTests::test_312_local_cell_overlap PASSED [0.1190s] [ 0%] 2025-09-07T06:59:42.7685120Z dynamo/test_repros.py::ReproTests::test_Size PASSED [0.0272s] [ 0%] 2025-09-07T06:59:42.7685338Z dynamo/test_repros.py::ReproTests::test_abc_setattr PASSED [0.0240s] [ 1%] 2025-09-07T06:59:42.7685560Z dynamo/test_repros.py::ReproTests::test_add_complex_conj PASSED [1.5899s] [ 1%] 2025-09-07T06:59:42.7685784Z dynamo/test_repros.py::ReproTests::test_add_sub_alpha_out PASSED [4.0950s] [ 1%] 2025-09-07T06:59:42.7686400Z dynamo/test_repros.py::ReproTests::test_addr_alpha_beta_out SKIPPED [0.0004s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/156641 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 2%] 2025-09-07T06:59:42.7687127Z dynamo/test_repros.py::ReproTests::test_amp_foreach_fake_impl PASSED [0.0366s] [ 2%] 2025-09-07T06:59:42.7687765Z dynamo/test_repros.py::ReproTests::test_aot_autograd_runtime_wrapper_prologue_profiled SKIPPED [0.0004s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/156678 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 2%] 2025-09-07T06:59:42.7688483Z dynamo/test_repros.py::ReproTests::test_as_strided_on_base_with_mutation_works PASSED [0.0331s] [ 3%] 2025-09-07T06:59:42.7688762Z dynamo/test_repros.py::ReproTests::test_as_strided_on_existing_view_banned PASSED [0.0188s] [ 3%] 2025-09-07T06:59:42.7689020Z dynamo/test_repros.py::ReproTests::test_attached_attribute_in_dir PASSED [0.0032s] [ 3%] 2025-09-07T06:59:42.7689274Z dynamo/test_repros.py::ReproTests::test_autograd_function_graph_break PASSED [0.0483s] [ 3%] 2025-09-07T06:59:42.7689523Z dynamo/test_repros.py::ReproTests::test_avoid_dupe_specialization PASSED [0.1233s] [ 4%] 2025-09-07T06:59:42.7689771Z dynamo/test_repros.py::ReproTests::test_batch_encoding_clone_inputs PASSED [0.0008s] [ 4%] 2025-09-07T06:59:42.7690004Z dynamo/test_repros.py::ReproTests::test_batch_norm_act PASSED [0.1435s] [ 4%] 2025-09-07T06:59:42.7690220Z dynamo/test_repros.py::ReproTests::test_batchnorm_e2e PASSED [0.7903s] [ 5%] 2025-09-07T06:59:42.7690448Z dynamo/test_repros.py::ReproTests::test_bigbird_unsqueeze_inplace PASSED [0.0499s] [ 5%] 2025-09-07T06:59:42.7690678Z dynamo/test_repros.py::ReproTests::test_bitwise_op_guard PASSED [0.4608s] [ 5%] 2025-09-07T06:59:42.7691279Z dynamo/test_repros.py::ReproTests::test_bitwise_print_precedence SKIPPED [0.0005s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/156736 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 6%] 2025-09-07T06:59:42.7691876Z dynamo/test_repros.py::ReproTests::test_boxes_len PASSED [0.0275s] [ 6%] 2025-09-07T06:59:42.7692110Z dynamo/test_repros.py::ReproTests::test_build_map_unpack_with_call PASSED [0.0315s] [ 6%] 2025-09-07T06:59:42.7692386Z dynamo/test_repros.py::ReproTests::test_c_defined_metaclass SKIPPED [0.0002s] (missing msgspec package) [ 6%] 2025-09-07T06:59:42.7692641Z dynamo/test_repros.py::ReproTests::test_changing_stride PASSED [0.4505s] [ 7%] 2025-09-07T06:59:42.7692864Z dynamo/test_repros.py::ReproTests::test_chunk_reformer_ff PASSED [0.3295s] [ 7%] 2025-09-07T06:59:42.7693128Z dynamo/test_repros.py::ReproTests::test_class_member PASSED [0.0285s] [ 7%] 2025-09-07T06:59:42.7693356Z dynamo/test_repros.py::ReproTests::test_classmethod_with_slots PASSED [0.0249s] [ 8%] 2025-09-07T06:59:42.7693704Z dynamo/test_repros.py::ReproTests::test_compilation_metrics_on_error PASSED [0.0216s] [ 8%] 2025-09-07T06:59:42.7693955Z dynamo/test_repros.py::ReproTests::test_compile_complex_conj PASSED [0.0390s] [ 8%] 2025-09-07T06:59:42.7694199Z dynamo/test_repros.py::ReproTests::test_compile_copy__int_overload PASSED [0.0280s] [ 9%] 2025-09-07T06:59:42.7694442Z dynamo/test_repros.py::ReproTests::test_const_dict_keyerror PASSED [0.0181s] [ 9%] 2025-09-07T06:59:42.7694679Z dynamo/test_repros.py::ReproTests::test_contains_range_constprop PASSED [0.0174s] [ 9%] 2025-09-07T06:59:42.7694931Z dynamo/test_repros.py::ReproTests::test_convert_boxes_to_pooler_format PASSED [0.6124s] [ 9%] 2025-09-07T06:59:42.7695175Z dynamo/test_repros.py::ReproTests::test_copy_weird_strides PASSED [0.6401s] [ 10%] 2025-09-07T06:59:42.7695418Z dynamo/test_repros.py::ReproTests::test_create_rand_mask_from_inputs PASSED [0.0409s] [ 10%] 2025-09-07T06:59:42.7695652Z dynamo/test_repros.py::ReproTests::test_dalle2_maybe PASSED [0.0213s] [ 10%] 2025-09-07T06:59:42.7695896Z dynamo/test_repros.py::ReproTests::test_data_attr_mutation_after_saved_for_bw PASSED [0.0435s] [ 11%] 2025-09-07T06:59:42.7696599Z dynamo/test_repros.py::ReproTests::test_dataclass_in_module SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/156776 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 11%] 2025-09-07T06:59:42.7697638Z dynamo/test_repros.py::ReproTests::test_dataclass_init_with_default_factory_with_inputs SKIPPED [0.0002s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/156799 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 11%] 2025-09-07T06:59:42.7698368Z dynamo/test_repros.py::ReproTests::test_ddp_checkpoint SKIPPED [0.0001s] (Failing with ncc update 2.25.1 : https://github.com/pytorch/pytorch/issues/147141) [ 12%] 2025-09-07T06:59:42.7698688Z dynamo/test_repros.py::ReproTests::test_dedup_global PASSED [0.4444s] [ 12%] 2025-09-07T06:59:42.7699289Z dynamo/test_repros.py::ReproTests::test_deferred_runtime_asserts SKIPPED [0.0004s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/156817 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 12%] 2025-09-07T06:59:42.7699880Z dynamo/test_repros.py::ReproTests::test_delattr PASSED [0.0281s] [ 12%] 2025-09-07T06:59:42.7700097Z dynamo/test_repros.py::ReproTests::test_delattr_raises PASSED [0.0103s] [ 13%] 2025-09-07T06:59:42.7704019Z dynamo/test_repros.py::ReproTests::test_delattr_return PASSED [0.0220s] [ 13%] 2025-09-07T06:59:42.7704266Z dynamo/test_repros.py::ReproTests::test_delete_local_error PASSED [0.0082s] [ 13%] 2025-09-07T06:59:42.7704524Z dynamo/test_repros.py::ReproTests::test_deleted_compile_wrapper_segfault PASSED [0.0177s] [ 14%] 2025-09-07T06:59:42.7704771Z dynamo/test_repros.py::ReproTests::test_delsubscr PASSED [0.0184s] [ 14%] 2025-09-07T06:59:42.7704994Z dynamo/test_repros.py::ReproTests::test_delsubscr_raises PASSED [0.0054s] [ 14%] 2025-09-07T06:59:42.7705229Z dynamo/test_repros.py::ReproTests::test_detectron2_instances_cat PASSED [0.0470s] [ 15%] 2025-09-07T06:59:42.7705504Z dynamo/test_repros.py::ReproTests::test_disabling_unpack_hooks_within_compiled_region PASSED [0.0414s] [ 15%] 2025-09-07T06:59:42.7705773Z dynamo/test_repros.py::ReproTests::test_distributions_subclass PASSED [0.1168s] [ 15%] 2025-09-07T06:59:42.7706064Z dynamo/test_repros.py::ReproTests::test_do_paste_mask PASSED [2.6214s] [ 15%] 2025-09-07T06:59:42.7706788Z dynamo/test_repros.py::ReproTests::test_dont_aggressively_write_assert SKIPPED [0.0004s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/156570 for platform(s) linux, rocm, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 16%] 2025-09-07T06:59:42.7707779Z dynamo/test_repros.py::ReproTests::test_dont_dce_rand SKIPPED [0.0002s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/156580 for platform(s) linux, rocm, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 16%] 2025-09-07T06:59:42.7708372Z dynamo/test_repros.py::ReproTests::test_dropout_inline PASSED [0.0518s] [ 16%] 2025-09-07T06:59:42.7708609Z dynamo/test_repros.py::ReproTests::test_dynamic_shape_disable_duck_size PASSED [0.4481s] [ 17%] 2025-09-07T06:59:42.7708874Z dynamo/test_repros.py::ReproTests::test_dynamic_shapes_double_not_equal PASSED [0.4663s] [ 17%] 2025-09-07T06:59:42.7709132Z dynamo/test_repros.py::ReproTests::test_dynamic_shapes_float_guard PASSED [0.0236s] [ 17%] 2025-09-07T06:59:42.7709377Z dynamo/test_repros.py::ReproTests::test_dynamic_shapes_implicit_guard PASSED [0.0249s] [ 18%] 2025-09-07T06:59:42.7709673Z dynamo/test_repros.py::ReproTests::test_dynamic_shapes_right_side PASSED [0.4352s] [ 18%] 2025-09-07T06:59:42.7709894Z dynamo/test_repros.py::ReproTests::test_ellipsis PASSED [0.0383s] [ 18%] 2025-09-07T06:59:42.7710134Z dynamo/test_repros.py::ReproTests::test_embedding_backward_broadcasting_decomp PASSED [0.0631s] [ 18%] 2025-09-07T06:59:42.7710409Z dynamo/test_repros.py::ReproTests::test_empty_graph_nested_calls_fullgraph_False PASSED [0.0337s] [ 19%] 2025-09-07T06:59:42.7710689Z dynamo/test_repros.py::ReproTests::test_empty_graph_nested_calls_fullgraph_True PASSED [0.0308s] [ 19%] 2025-09-07T06:59:42.7710948Z dynamo/test_repros.py::ReproTests::test_empty_list_contains_with_jump PASSED [0.0196s] [ 19%] 2025-09-07T06:59:42.7711178Z dynamo/test_repros.py::ReproTests::test_empty_out_dynamic PASSED [0.4896s] [ 20%] 2025-09-07T06:59:42.7711387Z dynamo/test_repros.py::ReproTests::test_enum PASSED [0.0259s] [ 20%] 2025-09-07T06:59:42.7711603Z dynamo/test_repros.py::ReproTests::test_ephemeral_module PASSED [0.0581s] [ 20%] 2025-09-07T06:59:42.7711839Z dynamo/test_repros.py::ReproTests::test_error_return_without_exception_set PASSED [0.0065s] [ 21%] 2025-09-07T06:59:42.7712092Z dynamo/test_repros.py::ReproTests::test_exception_in_dynamo_handling PASSED [0.0081s] [ 21%] 2025-09-07T06:59:42.7712317Z dynamo/test_repros.py::ReproTests::test_exec_import PASSED [0.0162s] [ 21%] 2025-09-07T06:59:42.7712530Z dynamo/test_repros.py::ReproTests::test_exec_wildcard_import PASSED [0.0471s] [ 21%] 2025-09-07T06:59:42.7712834Z dynamo/test_repros.py::ReproTests::test_flip_bad_accuracy SKIPPED [0.0002s] (Skip this flip test for the moment. It is under investigation) [ 22%] 2025-09-07T06:59:42.7713131Z dynamo/test_repros.py::ReproTests::test_for_loop_graph_break PASSED [0.0269s] [ 22%] 2025-09-07T06:59:42.7713362Z dynamo/test_repros.py::ReproTests::test_for_loop_graph_break_before PASSED [0.2190s] [ 22%] 2025-09-07T06:59:42.7713602Z dynamo/test_repros.py::ReproTests::test_foreach_decomp_arg_names PASSED [0.7357s] [ 23%] 2025-09-07T06:59:42.7713886Z dynamo/test_repros.py::ReproTests::test_fsdp_set_input_mutation_applied_when_input_gets_no_gradients PASSED [0.0008s] [ 23%] 2025-09-07T06:59:42.7714163Z dynamo/test_repros.py::ReproTests::test_function_in_skipfiles PASSED [0.0198s] [ 23%] 2025-09-07T06:59:42.7714380Z dynamo/test_repros.py::ReproTests::test_functools_wraps PASSED [0.0191s] [ 24%] 2025-09-07T06:59:42.7714693Z dynamo/test_repros.py::ReproTests::test_gan_repro_trying_to_backward_through_the_graph_a_second_time PASSED [0.2844s] [ 24%] 2025-09-07T06:59:42.7714962Z dynamo/test_repros.py::ReproTests::test_generator_dealloc PASSED [0.0131s] [ 24%] 2025-09-07T06:59:42.7715204Z dynamo/test_repros.py::ReproTests::test_get_parameter_dtype ('RERUN', {'yellow': True}) [0.0184s] [ 24%] 2025-09-07T06:59:42.7715473Z dynamo/test_repros.py::ReproTests::test_get_parameter_dtype ('RERUN', {'yellow': True}) [0.0142s] [ 24%] 2025-09-07T06:59:42.7715720Z dynamo/test_repros.py::ReproTests::test_get_parameter_dtype FAILED [0.0134s] [ 24%] 2025-09-07T06:59:42.7715850Z 2025-09-07T06:59:42.7715903Z ==================================== RERUNS ==================================== 2025-09-07T06:59:42.7716066Z _____________________ ReproTests.test_get_parameter_dtype ______________________ 2025-09-07T06:59:42.7716222Z Traceback (most recent call last): 2025-09-07T06:59:42.7716429Z File "/var/lib/jenkins/pytorch/test/dynamo/test_repros.py", line 1597, in test_get_parameter_dtype 2025-09-07T06:59:42.7716767Z self.assertEqual(opt_fn(model, torch.randn(10)).dtype, torch.float32) 2025-09-07T06:59:42.7716931Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7717160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper 2025-09-07T06:59:42.7717384Z return fn(*args, **kwargs) 2025-09-07T06:59:42.7717558Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7717809Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1875, in __call__ 2025-09-07T06:59:42.7718032Z result = self._torchdynamo_orig_backend( 2025-09-07T06:59:42.7718153Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7718363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 688, in __call__ 2025-09-07T06:59:42.7718570Z result = _compile( 2025-09-07T06:59:42.7718665Z ^^^^^^^^^ 2025-09-07T06:59:42.7718862Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1433, in _compile 2025-09-07T06:59:42.7719115Z guarded_code, tracer_output = compile_inner(code, one_graph, hooks) 2025-09-07T06:59:42.7719273Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7719492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_utils_internal.py", line 92, in wrapper_function 2025-09-07T06:59:42.7719711Z return function(*args, **kwargs) 2025-09-07T06:59:42.7719823Z ^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7720037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1117, in compile_inner 2025-09-07T06:59:42.7720268Z return _compile_inner(code, one_graph, hooks) 2025-09-07T06:59:42.7720397Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7720616Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1151, in _compile_inner 2025-09-07T06:59:42.7720839Z dynamo_output = compile_frame( 2025-09-07T06:59:42.7720948Z ^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7721155Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1032, in compile_frame 2025-09-07T06:59:42.7721406Z bytecode, tracer_output = transform_code_object(code, transform) 2025-09-07T06:59:42.7721561Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7721818Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/bytecode_transformation.py", line 1592, in transform_code_object 2025-09-07T06:59:42.7722098Z tracer_output = transformations(instructions, code_options) 2025-09-07T06:59:42.7722246Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7722466Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1004, in transform 2025-09-07T06:59:42.7722680Z tracer_output = trace_frame( 2025-09-07T06:59:42.7722832Z ^^^^^^^^^^^^ 2025-09-07T06:59:42.7723027Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 312, in _fn 2025-09-07T06:59:42.7723229Z return fn(*args, **kwargs) 2025-09-07T06:59:42.7723328Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7723537Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 815, in trace_frame 2025-09-07T06:59:42.7723754Z run_tracer() 2025-09-07T06:59:42.7723952Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 797, in run_tracer 2025-09-07T06:59:42.7724166Z tracer.run() 2025-09-07T06:59:42.7724357Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1487, in run 2025-09-07T06:59:42.7724569Z while self.step(): 2025-09-07T06:59:42.7724670Z ^^^^^^^^^^^ 2025-09-07T06:59:42.7724872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1348, in step 2025-09-07T06:59:42.7725103Z self.dispatch_table[inst.opcode](self, inst) 2025-09-07T06:59:42.7725345Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 3351, in BINARY_OP 2025-09-07T06:59:42.7725582Z return _binary_op_lookup[inst.arg](self, inst) 2025-09-07T06:59:42.7725715Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7725966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 462, in impl 2025-09-07T06:59:42.7726215Z self.push(fn_var.call_function(self, self.popn(nargs), {})) 2025-09-07T06:59:42.7726373Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7726780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/builtin.py", line 1347, in call_function 2025-09-07T06:59:42.7727106Z return handler(tx, args, kwargs) 2025-09-07T06:59:42.7727272Z ^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7727566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/builtin.py", line 966, in 2025-09-07T06:59:42.7727895Z return lambda tx, args, kwargs: obj.call_function( 2025-09-07T06:59:42.7728089Z ^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7728408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/builtin.py", line 1347, in call_function 2025-09-07T06:59:42.7728729Z return handler(tx, args, kwargs) 2025-09-07T06:59:42.7728858Z ^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7729103Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/builtin.py", line 1211, in _handle_insert_op_in_graph 2025-09-07T06:59:42.7729382Z return dispatch_torch_function(tx, fn_var, args, kwargs) 2025-09-07T06:59:42.7729536Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7729800Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/torch_function.py", line 487, in dispatch_torch_function 2025-09-07T06:59:42.7730089Z res = tx.symbolic_torch_function_state.call_torch_function_mode( 2025-09-07T06:59:42.7730247Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7730536Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/torch_function.py", line 216, in call_torch_function_mode 2025-09-07T06:59:42.7730830Z return cur_mode.call_torch_function(tx, fn, types, args, kwargs) 2025-09-07T06:59:42.7730987Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7731245Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/torch_function.py", line 334, in call_torch_function 2025-09-07T06:59:42.7731500Z return call_torch_function( 2025-09-07T06:59:42.7731616Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7731899Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/torch_function.py", line 451, in call_torch_function 2025-09-07T06:59:42.7732173Z return torch_function_var.call_function(tx, tf_args, {}) 2025-09-07T06:59:42.7732322Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7732565Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py", line 1154, in call_function 2025-09-07T06:59:42.7732818Z return super().call_function(tx, args, kwargs) 2025-09-07T06:59:42.7732953Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7733191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py", line 598, in call_function 2025-09-07T06:59:42.7733437Z return super().call_function(tx, args, kwargs) 2025-09-07T06:59:42.7733569Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7733807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py", line 342, in call_function 2025-09-07T06:59:42.7734093Z return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs) 2025-09-07T06:59:42.7734274Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7734535Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1288, in inline_user_function_return 2025-09-07T06:59:42.7734883Z return InliningInstructionTranslator.inline_call(self, fn, args, kwargs) 2025-09-07T06:59:42.7735059Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7735295Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 4112, in inline_call 2025-09-07T06:59:42.7735526Z return tracer.inline_call_() 2025-09-07T06:59:42.7735640Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7735867Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 4315, in inline_call_ 2025-09-07T06:59:42.7736093Z self.run() 2025-09-07T06:59:42.7736288Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1487, in run 2025-09-07T06:59:42.7736560Z while self.step(): 2025-09-07T06:59:42.7736662Z ^^^^^^^^^^^ 2025-09-07T06:59:42.7736866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1348, in step 2025-09-07T06:59:42.7737097Z self.dispatch_table[inst.opcode](self, inst) 2025-09-07T06:59:42.7737332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 904, in wrapper 2025-09-07T06:59:42.7737549Z return inner_fn(self, inst) 2025-09-07T06:59:42.7737664Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7737891Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 2371, in CALL_FUNCTION_EX 2025-09-07T06:59:42.7738147Z self.call_function(fn, argsvars.items, kwargsvars) 2025-09-07T06:59:42.7738399Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1266, in call_function 2025-09-07T06:59:42.7738675Z self.push(fn.call_function(self, args, kwargs)) # type: ignore[arg-type] 2025-09-07T06:59:42.7738843Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7739076Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/torch.py", line 1438, in call_function 2025-09-07T06:59:42.7739325Z return self.call_tensor_method(tx, args, kwargs) 2025-09-07T06:59:42.7739467Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7739709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/torch.py", line 1832, in call_tensor_method 2025-09-07T06:59:42.7740039Z return args[0].call_method(tx, self.get_function().__name__, args[1:], kwargs) 2025-09-07T06:59:42.7740214Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7740452Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py", line 713, in call_method 2025-09-07T06:59:42.7740675Z return wrap_fx_proxy( 2025-09-07T06:59:42.7740784Z ^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7741003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/builder.py", line 2644, in wrap_fx_proxy 2025-09-07T06:59:42.7741271Z return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs) 2025-09-07T06:59:42.7741431Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7741680Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/builder.py", line 2710, in wrap_fx_proxy_cls 2025-09-07T06:59:42.7741921Z return _wrap_fx_proxy( 2025-09-07T06:59:42.7742031Z ^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7742254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/builder.py", line 2808, in _wrap_fx_proxy 2025-09-07T06:59:42.7742531Z example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True) 2025-09-07T06:59:42.7742704Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7742927Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/utils.py", line 3478, in get_fake_value 2025-09-07T06:59:42.7743225Z raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None 2025-09-07T06:59:42.7743485Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/utils.py", line 3376, in get_fake_value 2025-09-07T06:59:42.7743705Z ret_val = wrap_fake_exception( 2025-09-07T06:59:42.7743821Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7744039Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/utils.py", line 2864, in wrap_fake_exception 2025-09-07T06:59:42.7744260Z return fn() 2025-09-07T06:59:42.7744353Z ^^^^ 2025-09-07T06:59:42.7744536Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/utils.py", line 3377, in 2025-09-07T06:59:42.7744772Z lambda: run_node(tx.output, node, args, kwargs, nnmodule) 2025-09-07T06:59:42.7744926Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7745142Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/utils.py", line 3587, in run_node 2025-09-07T06:59:42.7745379Z raise RuntimeError(make_error_message(e)).with_traceback( 2025-09-07T06:59:42.7745607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/utils.py", line 3557, in run_node 2025-09-07T06:59:42.7745867Z return getattr(args[0], node.target)(*args[1:], **kwargs) # type: ignore[arg-type] 2025-09-07T06:59:42.7746047Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7746254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_stats.py", line 28, in wrapper 2025-09-07T06:59:42.7746458Z return fn(*args, **kwargs) 2025-09-07T06:59:42.7746637Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7746867Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 1376, in __torch_dispatch__ 2025-09-07T06:59:42.7747118Z return self.dispatch(func, types, args, kwargs) 2025-09-07T06:59:42.7747259Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7747485Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 2092, in dispatch 2025-09-07T06:59:42.7747738Z return self._cached_dispatch_impl(func, types, args, kwargs) 2025-09-07T06:59:42.7747897Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7748190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 1511, in _cached_dispatch_impl 2025-09-07T06:59:42.7748460Z output = self._dispatch_impl(func, types, args, kwargs) 2025-09-07T06:59:42.7748606Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7748843Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 2759, in _dispatch_impl 2025-09-07T06:59:42.7749096Z self.wrap_meta_outputs_with_default_device_logic( 2025-09-07T06:59:42.7749381Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 2886, in wrap_meta_outputs_with_default_device_logic 2025-09-07T06:59:42.7749649Z return tree_map(wrap, r) 2025-09-07T06:59:42.7749761Z ^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7749958Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T06:59:42.7750186Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T06:59:42.7750329Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7750545Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T06:59:42.7750757Z leaves = list(leaves) 2025-09-07T06:59:42.7750863Z ^^^^^^^^^^^^ 2025-09-07T06:59:42.7751060Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 2864, in wrap 2025-09-07T06:59:42.7751339Z ) = FakeTensor._find_common_device(func, flat_args) 2025-09-07T06:59:42.7751482Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7751722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 965, in _find_common_device 2025-09-07T06:59:42.7751955Z merge_devices(arg) 2025-09-07T06:59:42.7752169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 960, in merge_devices 2025-09-07T06:59:42.7752397Z raise RuntimeError( 2025-09-07T06:59:42.7752824Z torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_method add(*(FakeTensor(..., device='cuda:0', size=(10,)), FakeTensor(..., size=(10,))), **{}): got RuntimeError('Unhandled FakeTensor Device Propagation for aten.add.Tensor, found two different devices cuda:0, cpu') 2025-09-07T06:59:42.7753215Z 2025-09-07T06:59:42.7753253Z from user code: 2025-09-07T06:59:42.7753414Z File "/var/lib/jenkins/pytorch/test/dynamo/test_repros.py", line 1593, in fn 2025-09-07T06:59:42.7753625Z return x + torch.randn(10, dtype=get_parameter_dtype(model)) 2025-09-07T06:59:42.7753879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_device.py", line 103, in __torch_function__ 2025-09-07T06:59:42.7754106Z return func(*args, **kwargs) 2025-09-07T06:59:42.7754179Z 2025-09-07T06:59:42.7754395Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T06:59:42.7754642Z 2025-09-07T06:59:42.7754644Z 2025-09-07T06:59:42.7754724Z To execute this test, run the following from the base repo dir: 2025-09-07T06:59:42.7754956Z PYTORCH_TEST_WITH_ROCM=1 python test/dynamo/test_repros.py ReproTests.test_get_parameter_dtype 2025-09-07T06:59:42.7755110Z 2025-09-07T06:59:42.7755203Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T06:59:42.7755411Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T06:59:42.7755560Z inline_call [] 2025-09-07T06:59:42.7755695Z _____________________ ReproTests.test_get_parameter_dtype ______________________ 2025-09-07T06:59:42.7755858Z Traceback (most recent call last): 2025-09-07T06:59:42.7756066Z File "/var/lib/jenkins/pytorch/test/dynamo/test_repros.py", line 1597, in test_get_parameter_dtype 2025-09-07T06:59:42.7756311Z self.assertEqual(opt_fn(model, torch.randn(10)).dtype, torch.float32) 2025-09-07T06:59:42.7756589Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7756821Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper 2025-09-07T06:59:42.7757045Z return fn(*args, **kwargs) 2025-09-07T06:59:42.7757158Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7757366Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1875, in __call__ 2025-09-07T06:59:42.7757669Z result = self._torchdynamo_orig_backend( 2025-09-07T06:59:42.7757796Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7758010Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 688, in __call__ 2025-09-07T06:59:42.7758226Z result = _compile( 2025-09-07T06:59:42.7758325Z ^^^^^^^^^ 2025-09-07T06:59:42.7758523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1433, in _compile 2025-09-07T06:59:42.7758784Z guarded_code, tracer_output = compile_inner(code, one_graph, hooks) 2025-09-07T06:59:42.7758950Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7759175Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_utils_internal.py", line 92, in wrapper_function 2025-09-07T06:59:42.7759397Z return function(*args, **kwargs) 2025-09-07T06:59:42.7759517Z ^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7759776Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1117, in compile_inner 2025-09-07T06:59:42.7760016Z return _compile_inner(code, one_graph, hooks) 2025-09-07T06:59:42.7760149Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7760379Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1151, in _compile_inner 2025-09-07T06:59:42.7762455Z dynamo_output = compile_frame( 2025-09-07T06:59:42.7762569Z ^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7762787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1032, in compile_frame 2025-09-07T06:59:42.7763043Z bytecode, tracer_output = transform_code_object(code, transform) 2025-09-07T06:59:42.7763200Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7763452Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/bytecode_transformation.py", line 1592, in transform_code_object 2025-09-07T06:59:42.7763738Z tracer_output = transformations(instructions, code_options) 2025-09-07T06:59:42.7763888Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7764108Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1004, in transform 2025-09-07T06:59:42.7764326Z tracer_output = trace_frame( 2025-09-07T06:59:42.7764434Z ^^^^^^^^^^^^ 2025-09-07T06:59:42.7765780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 312, in _fn 2025-09-07T06:59:42.7765983Z return fn(*args, **kwargs) 2025-09-07T06:59:42.7766085Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7766287Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 815, in trace_frame 2025-09-07T06:59:42.7766595Z run_tracer() 2025-09-07T06:59:42.7766833Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 797, in run_tracer 2025-09-07T06:59:42.7767040Z tracer.run() 2025-09-07T06:59:42.7767224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1487, in run 2025-09-07T06:59:42.7767427Z while self.step(): 2025-09-07T06:59:42.7767523Z ^^^^^^^^^^^ 2025-09-07T06:59:42.7767713Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1348, in step 2025-09-07T06:59:42.7770991Z self.dispatch_table[inst.opcode](self, inst) 2025-09-07T06:59:42.7771236Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 3351, in BINARY_OP 2025-09-07T06:59:42.7771469Z return _binary_op_lookup[inst.arg](self, inst) 2025-09-07T06:59:42.7771598Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7771807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 462, in impl 2025-09-07T06:59:42.7772050Z self.push(fn_var.call_function(self, self.popn(nargs), {})) 2025-09-07T06:59:42.7772201Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7772433Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/builtin.py", line 1347, in call_function 2025-09-07T06:59:42.7772663Z return handler(tx, args, kwargs) 2025-09-07T06:59:42.7772773Z ^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7772989Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/builtin.py", line 966, in 2025-09-07T06:59:42.7774256Z return lambda tx, args, kwargs: obj.call_function( 2025-09-07T06:59:42.7774391Z ^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7774619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/builtin.py", line 1347, in call_function 2025-09-07T06:59:42.7774891Z return handler(tx, args, kwargs) 2025-09-07T06:59:42.7774998Z ^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7775232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/builtin.py", line 1211, in _handle_insert_op_in_graph 2025-09-07T06:59:42.7775501Z return dispatch_torch_function(tx, fn_var, args, kwargs) 2025-09-07T06:59:42.7775643Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7775901Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/torch_function.py", line 487, in dispatch_torch_function 2025-09-07T06:59:42.7776180Z res = tx.symbolic_torch_function_state.call_torch_function_mode( 2025-09-07T06:59:42.7776328Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7777625Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/torch_function.py", line 216, in call_torch_function_mode 2025-09-07T06:59:42.7777913Z return cur_mode.call_torch_function(tx, fn, types, args, kwargs) 2025-09-07T06:59:42.7778063Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7778309Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/torch_function.py", line 334, in call_torch_function 2025-09-07T06:59:42.7778554Z return call_torch_function( 2025-09-07T06:59:42.7778658Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7778890Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/torch_function.py", line 451, in call_torch_function 2025-09-07T06:59:42.7779158Z return torch_function_var.call_function(tx, tf_args, {}) 2025-09-07T06:59:42.7779299Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7779533Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py", line 1154, in call_function 2025-09-07T06:59:42.7779775Z return super().call_function(tx, args, kwargs) 2025-09-07T06:59:42.7780792Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7781023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py", line 598, in call_function 2025-09-07T06:59:42.7781260Z return super().call_function(tx, args, kwargs) 2025-09-07T06:59:42.7781382Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7781605Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py", line 342, in call_function 2025-09-07T06:59:42.7781929Z return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs) 2025-09-07T06:59:42.7782104Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7782355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1288, in inline_user_function_return 2025-09-07T06:59:42.7782649Z return InliningInstructionTranslator.inline_call(self, fn, args, kwargs) 2025-09-07T06:59:42.7782822Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7783932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 4112, in inline_call 2025-09-07T06:59:42.7784161Z return tracer.inline_call_() 2025-09-07T06:59:42.7784267Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7784479Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 4315, in inline_call_ 2025-09-07T06:59:42.7784699Z self.run() 2025-09-07T06:59:42.7784882Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1487, in run 2025-09-07T06:59:42.7785088Z while self.step(): 2025-09-07T06:59:42.7785180Z ^^^^^^^^^^^ 2025-09-07T06:59:42.7785371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1348, in step 2025-09-07T06:59:42.7785648Z self.dispatch_table[inst.opcode](self, inst) 2025-09-07T06:59:42.7785874Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 904, in wrapper 2025-09-07T06:59:42.7787045Z return inner_fn(self, inst) 2025-09-07T06:59:42.7787156Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7787372Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 2371, in CALL_FUNCTION_EX 2025-09-07T06:59:42.7787619Z self.call_function(fn, argsvars.items, kwargsvars) 2025-09-07T06:59:42.7787862Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1266, in call_function 2025-09-07T06:59:42.7788130Z self.push(fn.call_function(self, args, kwargs)) # type: ignore[arg-type] 2025-09-07T06:59:42.7788288Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7788510Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/torch.py", line 1438, in call_function 2025-09-07T06:59:42.7788753Z return self.call_tensor_method(tx, args, kwargs) 2025-09-07T06:59:42.7788886Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7789119Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/torch.py", line 1832, in call_tensor_method 2025-09-07T06:59:42.7790251Z return args[0].call_method(tx, self.get_function().__name__, args[1:], kwargs) 2025-09-07T06:59:42.7790422Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7790651Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py", line 713, in call_method 2025-09-07T06:59:42.7790873Z return wrap_fx_proxy( 2025-09-07T06:59:42.7790971Z ^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7791179Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/builder.py", line 2644, in wrap_fx_proxy 2025-09-07T06:59:42.7791437Z return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs) 2025-09-07T06:59:42.7791588Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7791824Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/builder.py", line 2710, in wrap_fx_proxy_cls 2025-09-07T06:59:42.7792058Z return _wrap_fx_proxy( 2025-09-07T06:59:42.7792157Z ^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7793270Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/builder.py", line 2808, in _wrap_fx_proxy 2025-09-07T06:59:42.7793548Z example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True) 2025-09-07T06:59:42.7793711Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7793926Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/utils.py", line 3478, in get_fake_value 2025-09-07T06:59:42.7794179Z raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None 2025-09-07T06:59:42.7794431Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/utils.py", line 3376, in get_fake_value 2025-09-07T06:59:42.7794641Z ret_val = wrap_fake_exception( 2025-09-07T06:59:42.7794750Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7794958Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/utils.py", line 2864, in wrap_fake_exception 2025-09-07T06:59:42.7795168Z return fn() 2025-09-07T06:59:42.7795253Z ^^^^ 2025-09-07T06:59:42.7796297Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/utils.py", line 3377, in 2025-09-07T06:59:42.7796604Z lambda: run_node(tx.output, node, args, kwargs, nnmodule) 2025-09-07T06:59:42.7796751Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7796953Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/utils.py", line 3587, in run_node 2025-09-07T06:59:42.7797222Z raise RuntimeError(make_error_message(e)).with_traceback( 2025-09-07T06:59:42.7797505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/utils.py", line 3557, in run_node 2025-09-07T06:59:42.7797759Z return getattr(args[0], node.target)(*args[1:], **kwargs) # type: ignore[arg-type] 2025-09-07T06:59:42.7797930Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7798128Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_stats.py", line 28, in wrapper 2025-09-07T06:59:42.7798328Z return fn(*args, **kwargs) 2025-09-07T06:59:42.7798430Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7799547Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 1376, in __torch_dispatch__ 2025-09-07T06:59:42.7799793Z return self.dispatch(func, types, args, kwargs) 2025-09-07T06:59:42.7799923Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7800142Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 2092, in dispatch 2025-09-07T06:59:42.7800387Z return self._cached_dispatch_impl(func, types, args, kwargs) 2025-09-07T06:59:42.7800534Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7800774Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 1511, in _cached_dispatch_impl 2025-09-07T06:59:42.7801034Z output = self._dispatch_impl(func, types, args, kwargs) 2025-09-07T06:59:42.7801180Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7801408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 2759, in _dispatch_impl 2025-09-07T06:59:42.7801650Z self.wrap_meta_outputs_with_default_device_logic( 2025-09-07T06:59:42.7802784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 2886, in wrap_meta_outputs_with_default_device_logic 2025-09-07T06:59:42.7803049Z return tree_map(wrap, r) 2025-09-07T06:59:42.7803151Z ^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7803338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T06:59:42.7803557Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T06:59:42.7803689Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7803969Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T06:59:42.7804172Z leaves = list(leaves) 2025-09-07T06:59:42.7804268Z ^^^^^^^^^^^^ 2025-09-07T06:59:42.7804462Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 2864, in wrap 2025-09-07T06:59:42.7804688Z ) = FakeTensor._find_common_device(func, flat_args) 2025-09-07T06:59:42.7805693Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7805928Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 965, in _find_common_device 2025-09-07T06:59:42.7806154Z merge_devices(arg) 2025-09-07T06:59:42.7806358Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 960, in merge_devices 2025-09-07T06:59:42.7806709Z raise RuntimeError( 2025-09-07T06:59:42.7807148Z torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_method add(*(FakeTensor(..., device='cuda:0', size=(10,)), FakeTensor(..., size=(10,))), **{}): got RuntimeError('Unhandled FakeTensor Device Propagation for aten.add.Tensor, found two different devices cuda:0, cpu') 2025-09-07T06:59:42.7807535Z 2025-09-07T06:59:42.7807569Z from user code: 2025-09-07T06:59:42.7807722Z File "/var/lib/jenkins/pytorch/test/dynamo/test_repros.py", line 1593, in fn 2025-09-07T06:59:42.7807922Z return x + torch.randn(10, dtype=get_parameter_dtype(model)) 2025-09-07T06:59:42.7808218Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_device.py", line 103, in __torch_function__ 2025-09-07T06:59:42.7808434Z return func(*args, **kwargs) 2025-09-07T06:59:42.7809440Z 2025-09-07T06:59:42.7809652Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T06:59:42.7809894Z 2025-09-07T06:59:42.7809896Z 2025-09-07T06:59:42.7809973Z To execute this test, run the following from the base repo dir: 2025-09-07T06:59:42.7810195Z PYTORCH_TEST_WITH_ROCM=1 python test/dynamo/test_repros.py ReproTests.test_get_parameter_dtype 2025-09-07T06:59:42.7810345Z 2025-09-07T06:59:42.7810434Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T06:59:42.7810628Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T06:59:42.7810774Z inline_call [] 2025-09-07T06:59:42.7810894Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T06:59:42.7811035Z inline_call [] 2025-09-07T06:59:42.7811134Z =================================== FAILURES =================================== 2025-09-07T06:59:42.7811298Z _____________________ ReproTests.test_get_parameter_dtype ______________________ 2025-09-07T06:59:42.7811450Z Traceback (most recent call last): 2025-09-07T06:59:42.7812512Z File "/var/lib/jenkins/pytorch/test/dynamo/test_repros.py", line 1597, in test_get_parameter_dtype 2025-09-07T06:59:42.7812757Z self.assertEqual(opt_fn(model, torch.randn(10)).dtype, torch.float32) 2025-09-07T06:59:42.7812914Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7813137Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper 2025-09-07T06:59:42.7813357Z return fn(*args, **kwargs) 2025-09-07T06:59:42.7813459Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7813660Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1875, in __call__ 2025-09-07T06:59:42.7813880Z result = self._torchdynamo_orig_backend( 2025-09-07T06:59:42.7814000Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7814206Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 688, in __call__ 2025-09-07T06:59:42.7814413Z result = _compile( 2025-09-07T06:59:42.7815374Z ^^^^^^^^^ 2025-09-07T06:59:42.7815612Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1433, in _compile 2025-09-07T06:59:42.7815864Z guarded_code, tracer_output = compile_inner(code, one_graph, hooks) 2025-09-07T06:59:42.7816020Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7816236Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_utils_internal.py", line 92, in wrapper_function 2025-09-07T06:59:42.7816450Z return function(*args, **kwargs) 2025-09-07T06:59:42.7816630Z ^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7816842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1117, in compile_inner 2025-09-07T06:59:42.7817074Z return _compile_inner(code, one_graph, hooks) 2025-09-07T06:59:42.7817199Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7818302Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1151, in _compile_inner 2025-09-07T06:59:42.7818529Z dynamo_output = compile_frame( 2025-09-07T06:59:42.7818637Z ^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7818845Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1032, in compile_frame 2025-09-07T06:59:42.7819096Z bytecode, tracer_output = transform_code_object(code, transform) 2025-09-07T06:59:42.7819299Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7819554Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/bytecode_transformation.py", line 1592, in transform_code_object 2025-09-07T06:59:42.7819832Z tracer_output = transformations(instructions, code_options) 2025-09-07T06:59:42.7819980Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7820199Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1004, in transform 2025-09-07T06:59:42.7820418Z tracer_output = trace_frame( 2025-09-07T06:59:42.7821382Z ^^^^^^^^^^^^ 2025-09-07T06:59:42.7821579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 312, in _fn 2025-09-07T06:59:42.7821781Z return fn(*args, **kwargs) 2025-09-07T06:59:42.7821882Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7822082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 815, in trace_frame 2025-09-07T06:59:42.7822291Z run_tracer() 2025-09-07T06:59:42.7822478Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 797, in run_tracer 2025-09-07T06:59:42.7822684Z tracer.run() 2025-09-07T06:59:42.7822868Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1487, in run 2025-09-07T06:59:42.7823072Z while self.step(): 2025-09-07T06:59:42.7823164Z ^^^^^^^^^^^ 2025-09-07T06:59:42.7824206Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1348, in step 2025-09-07T06:59:42.7824432Z self.dispatch_table[inst.opcode](self, inst) 2025-09-07T06:59:42.7824663Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 3351, in BINARY_OP 2025-09-07T06:59:42.7824893Z return _binary_op_lookup[inst.arg](self, inst) 2025-09-07T06:59:42.7825022Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7825231Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 462, in impl 2025-09-07T06:59:42.7825469Z self.push(fn_var.call_function(self, self.popn(nargs), {})) 2025-09-07T06:59:42.7825615Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7825844Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/builtin.py", line 1347, in call_function 2025-09-07T06:59:42.7826117Z return handler(tx, args, kwargs) 2025-09-07T06:59:42.7826228Z ^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7827380Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/builtin.py", line 966, in 2025-09-07T06:59:42.7827618Z return lambda tx, args, kwargs: obj.call_function( 2025-09-07T06:59:42.7827751Z ^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7827978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/builtin.py", line 1347, in call_function 2025-09-07T06:59:42.7828202Z return handler(tx, args, kwargs) 2025-09-07T06:59:42.7828310Z ^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7828543Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/builtin.py", line 1211, in _handle_insert_op_in_graph 2025-09-07T06:59:42.7828810Z return dispatch_torch_function(tx, fn_var, args, kwargs) 2025-09-07T06:59:42.7828956Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7829209Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/torch_function.py", line 487, in dispatch_torch_function 2025-09-07T06:59:42.7829488Z res = tx.symbolic_torch_function_state.call_torch_function_mode( 2025-09-07T06:59:42.7830490Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7830805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/torch_function.py", line 216, in call_torch_function_mode 2025-09-07T06:59:42.7831083Z return cur_mode.call_torch_function(tx, fn, types, args, kwargs) 2025-09-07T06:59:42.7831230Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7831477Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/torch_function.py", line 334, in call_torch_function 2025-09-07T06:59:42.7831721Z return call_torch_function( 2025-09-07T06:59:42.7831826Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7832054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/torch_function.py", line 451, in call_torch_function 2025-09-07T06:59:42.7832322Z return torch_function_var.call_function(tx, tf_args, {}) 2025-09-07T06:59:42.7832462Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7832696Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py", line 1154, in call_function 2025-09-07T06:59:42.7833821Z return super().call_function(tx, args, kwargs) 2025-09-07T06:59:42.7833951Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7834178Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py", line 598, in call_function 2025-09-07T06:59:42.7834414Z return super().call_function(tx, args, kwargs) 2025-09-07T06:59:42.7834536Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7834763Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/functions.py", line 342, in call_function 2025-09-07T06:59:42.7835041Z return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs) 2025-09-07T06:59:42.7835213Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7835463Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1288, in inline_user_function_return 2025-09-07T06:59:42.7835764Z return InliningInstructionTranslator.inline_call(self, fn, args, kwargs) 2025-09-07T06:59:42.7835935Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7837084Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 4112, in inline_call 2025-09-07T06:59:42.7837307Z return tracer.inline_call_() 2025-09-07T06:59:42.7837413Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7837740Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 4315, in inline_call_ 2025-09-07T06:59:42.7837956Z self.run() 2025-09-07T06:59:42.7838138Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1487, in run 2025-09-07T06:59:42.7838344Z while self.step(): 2025-09-07T06:59:42.7838436Z ^^^^^^^^^^^ 2025-09-07T06:59:42.7838628Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1348, in step 2025-09-07T06:59:42.7838851Z self.dispatch_table[inst.opcode](self, inst) 2025-09-07T06:59:42.7839078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 904, in wrapper 2025-09-07T06:59:42.7840176Z return inner_fn(self, inst) 2025-09-07T06:59:42.7840282Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7840504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 2371, in CALL_FUNCTION_EX 2025-09-07T06:59:42.7840754Z self.call_function(fn, argsvars.items, kwargsvars) 2025-09-07T06:59:42.7840994Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1266, in call_function 2025-09-07T06:59:42.7841259Z self.push(fn.call_function(self, args, kwargs)) # type: ignore[arg-type] 2025-09-07T06:59:42.7841468Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7841690Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/torch.py", line 1438, in call_function 2025-09-07T06:59:42.7841932Z return self.call_tensor_method(tx, args, kwargs) 2025-09-07T06:59:42.7842062Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7842296Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/torch.py", line 1832, in call_tensor_method 2025-09-07T06:59:42.7843444Z return args[0].call_method(tx, self.get_function().__name__, args[1:], kwargs) 2025-09-07T06:59:42.7843611Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7843838Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/tensor.py", line 713, in call_method 2025-09-07T06:59:42.7844057Z return wrap_fx_proxy( 2025-09-07T06:59:42.7844155Z ^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7844366Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/builder.py", line 2644, in wrap_fx_proxy 2025-09-07T06:59:42.7844624Z return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs) 2025-09-07T06:59:42.7844775Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7845013Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/builder.py", line 2710, in wrap_fx_proxy_cls 2025-09-07T06:59:42.7845250Z return _wrap_fx_proxy( 2025-09-07T06:59:42.7845353Z ^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7846461Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/variables/builder.py", line 2808, in _wrap_fx_proxy 2025-09-07T06:59:42.7846873Z example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True) 2025-09-07T06:59:42.7847038Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7847252Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/utils.py", line 3478, in get_fake_value 2025-09-07T06:59:42.7847507Z raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None 2025-09-07T06:59:42.7847757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/utils.py", line 3376, in get_fake_value 2025-09-07T06:59:42.7847969Z ret_val = wrap_fake_exception( 2025-09-07T06:59:42.7848078Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7848336Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/utils.py", line 2864, in wrap_fake_exception 2025-09-07T06:59:42.7848547Z return fn() 2025-09-07T06:59:42.7848631Z ^^^^ 2025-09-07T06:59:42.7849763Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/utils.py", line 3377, in 2025-09-07T06:59:42.7849990Z lambda: run_node(tx.output, node, args, kwargs, nnmodule) 2025-09-07T06:59:42.7850135Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7850340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/utils.py", line 3587, in run_node 2025-09-07T06:59:42.7850564Z raise RuntimeError(make_error_message(e)).with_traceback( 2025-09-07T06:59:42.7850784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/utils.py", line 3557, in run_node 2025-09-07T06:59:42.7851033Z return getattr(args[0], node.target)(*args[1:], **kwargs) # type: ignore[arg-type] 2025-09-07T06:59:42.7851206Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7851410Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_stats.py", line 28, in wrapper 2025-09-07T06:59:42.7851604Z return fn(*args, **kwargs) 2025-09-07T06:59:42.7852601Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7852823Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 1376, in __torch_dispatch__ 2025-09-07T06:59:42.7853112Z return self.dispatch(func, types, args, kwargs) 2025-09-07T06:59:42.7853240Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7853457Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 2092, in dispatch 2025-09-07T06:59:42.7853701Z return self._cached_dispatch_impl(func, types, args, kwargs) 2025-09-07T06:59:42.7853850Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7854094Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 1511, in _cached_dispatch_impl 2025-09-07T06:59:42.7854354Z output = self._dispatch_impl(func, types, args, kwargs) 2025-09-07T06:59:42.7854495Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7854722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 2759, in _dispatch_impl 2025-09-07T06:59:42.7855881Z self.wrap_meta_outputs_with_default_device_logic( 2025-09-07T06:59:42.7856165Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 2886, in wrap_meta_outputs_with_default_device_logic 2025-09-07T06:59:42.7856426Z return tree_map(wrap, r) 2025-09-07T06:59:42.7856644Z ^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7856830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T06:59:42.7857048Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T06:59:42.7857186Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7857386Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T06:59:42.7857587Z leaves = list(leaves) 2025-09-07T06:59:42.7857685Z ^^^^^^^^^^^^ 2025-09-07T06:59:42.7857876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 2864, in wrap 2025-09-07T06:59:42.7859016Z ) = FakeTensor._find_common_device(func, flat_args) 2025-09-07T06:59:42.7859152Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T06:59:42.7859383Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 965, in _find_common_device 2025-09-07T06:59:42.7859608Z merge_devices(arg) 2025-09-07T06:59:42.7859810Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 960, in merge_devices 2025-09-07T06:59:42.7860028Z raise RuntimeError( 2025-09-07T06:59:42.7860493Z torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_method add(*(FakeTensor(..., device='cuda:0', size=(10,)), FakeTensor(..., size=(10,))), **{}): got RuntimeError('Unhandled FakeTensor Device Propagation for aten.add.Tensor, found two different devices cuda:0, cpu') 2025-09-07T06:59:42.7860877Z 2025-09-07T06:59:42.7860914Z from user code: 2025-09-07T06:59:42.7861066Z File "/var/lib/jenkins/pytorch/test/dynamo/test_repros.py", line 1593, in fn 2025-09-07T06:59:42.7861265Z return x + torch.randn(10, dtype=get_parameter_dtype(model)) 2025-09-07T06:59:42.7861510Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_device.py", line 103, in __torch_function__ 2025-09-07T06:59:42.7862638Z return func(*args, **kwargs) 2025-09-07T06:59:42.7862712Z 2025-09-07T06:59:42.7862925Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T06:59:42.7863165Z 2025-09-07T06:59:42.7863167Z 2025-09-07T06:59:42.7863240Z To execute this test, run the following from the base repo dir: 2025-09-07T06:59:42.7863463Z PYTORCH_TEST_WITH_ROCM=1 python test/dynamo/test_repros.py ReproTests.test_get_parameter_dtype 2025-09-07T06:59:42.7863615Z 2025-09-07T06:59:42.7863703Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T06:59:42.7863942Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T06:59:42.7864084Z inline_call [] 2025-09-07T06:59:42.7864206Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T06:59:42.7864345Z inline_call [] 2025-09-07T06:59:42.7864463Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T06:59:42.7864600Z inline_call [] 2025-09-07T06:59:42.7865753Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/dynamo.test_repros/dynamo.test_repros-c96249d0993a3f5e.xml - 2025-09-07T06:59:42.7866037Z =========================== short test summary info ============================ 2025-09-07T06:59:42.7866711Z FAILED [0.0134s] dynamo/test_repros.py::ReproTests::test_get_parameter_dtype - torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_method add(*(FakeTensor(..., device='cuda:0', size=(10,)), FakeTensor(..., size=(10,))), **{}): got RuntimeError('Unhandled FakeTensor Device Propagation for aten.add.Tensor, found two different devices cuda:0, cpu') 2025-09-07T06:59:42.7867197Z 2025-09-07T06:59:42.7867231Z from user code: 2025-09-07T06:59:42.7867378Z File "/var/lib/jenkins/pytorch/test/dynamo/test_repros.py", line 1593, in fn 2025-09-07T06:59:42.7867571Z return x + torch.randn(10, dtype=get_parameter_dtype(model)) 2025-09-07T06:59:42.7867816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_device.py", line 103, in __torch_function__ 2025-09-07T06:59:42.7868034Z return func(*args, **kwargs) 2025-09-07T06:59:42.7868103Z 2025-09-07T06:59:42.7868311Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T06:59:42.7868551Z 2025-09-07T06:59:42.7868553Z 2025-09-07T06:59:42.7868626Z To execute this test, run the following from the base repo dir: 2025-09-07T06:59:42.7868849Z PYTORCH_TEST_WITH_ROCM=1 python test/dynamo/test_repros.py ReproTests.test_get_parameter_dtype 2025-09-07T06:59:42.7869942Z 2025-09-07T06:59:42.7870031Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T06:59:42.7870212Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T06:59:42.7870378Z ============== 1 failed, 71 passed, 11 skipped, 2 rerun in 17.56s ============== 2025-09-07T06:59:42.7870514Z Got exit code 1 2025-09-07T06:59:42.7870602Z Retrying single test... 2025-09-07T06:59:42.7871147Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T06:59:42.7871637Z import pkg_resources 2025-09-07T06:59:42.7871844Z Test results will be stored in test-reports/python-pytest/dynamo.test_repros/dynamo.test_repros-1b596d65104c597c.xml 2025-09-07T06:59:42.7872081Z ============================= test session starts ============================== 2025-09-07T06:59:42.7872287Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T06:59:42.7872470Z cachedir: .pytest_cache 2025-09-07T06:59:42.7873615Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T06:59:42.7873856Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T06:59:42.7873968Z configfile: pytest.ini 2025-09-07T06:59:42.7874192Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T06:59:42.7874462Z collecting ... collected 333 items / 332 deselected / 1 selected 2025-09-07T06:59:42.7874718Z stepcurrent: skipping 82 already run items. Running only test/dynamo/test_repros.py::ReproTests::test_get_parameter_dtype 2025-09-07T06:59:42.7874985Z Running 1 items in this shard 2025-09-07T06:59:42.7875054Z 2025-09-07T06:59:42.7875155Z dynamo/test_repros.py::ReproTests::test_get_parameter_dtype PASSED [0.2011s] [100%] 2025-09-07T06:59:42.7875287Z 2025-09-07T06:59:42.7875479Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/dynamo.test_repros/dynamo.test_repros-1b596d65104c597c.xml - 2025-09-07T06:59:42.7875763Z ====================== 1 passed, 332 deselected in 0.21s ======================= 2025-09-07T06:59:42.7876903Z Got exit code 0 2025-09-07T06:59:42.7877043Z Test succeeeded in new process, continuing with the rest of the tests 2025-09-07T06:59:42.7877655Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T06:59:42.7878143Z import pkg_resources 2025-09-07T06:59:42.7878348Z Test results will be stored in test-reports/python-pytest/dynamo.test_repros/dynamo.test_repros-ca4656eacb8dd0c3.xml 2025-09-07T06:59:42.7878583Z ============================= test session starts ============================== 2025-09-07T06:59:42.7878784Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T06:59:42.7878966Z cachedir: .pytest_cache 2025-09-07T06:59:42.7879186Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T06:59:42.7879417Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T06:59:42.7879527Z configfile: pytest.ini 2025-09-07T06:59:42.7880706Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T06:59:42.7880976Z collecting ... collected 333 items / 83 deselected / 250 selected 2025-09-07T06:59:42.7881135Z stepcurrent: skipping 83 already run items. 2025-09-07T06:59:42.7881259Z Running 250 items in this shard 2025-09-07T06:59:42.7881328Z 2025-09-07T06:59:42.7881422Z dynamo/test_repros.py::ReproTests::test_get_type_hints PASSED [0.2224s] [ 0%] 2025-09-07T06:59:42.7881640Z dynamo/test_repros.py::ReproTests::test_global_fn_mutation PASSED [0.0341s] [ 0%] 2025-09-07T06:59:42.7881854Z dynamo/test_repros.py::ReproTests::test_grad PASSED [0.0196s] [ 1%] 2025-09-07T06:59:42.7882155Z dynamo/test_repros.py::ReproTests::test_grad_mode_carrying_correct_state_after_graph_break PASSED [0.0366s] [ 1%] 2025-09-07T06:59:42.7882608Z dynamo/test_repros.py::ReproTests::test_grad_references_cleared W0907 06:58:37.685000 252905 site-packages/torch/_logging/_internal.py:1199] [0/0] Profiler function will be ignored 2025-09-07T06:59:42.7882962Z PASSED [0.8788s] [ 2%] 2025-09-07T06:59:42.7883134Z dynamo/test_repros.py::ReproTests::test_graph_break_on_jit_isinstance PASSED [0.0230s] [ 2%] 2025-09-07T06:59:42.7884326Z dynamo/test_repros.py::ReproTests::test_graph_break_on_jit_isinstance_pep585 PASSED [0.0208s] [ 2%] 2025-09-07T06:59:42.7884982Z dynamo/test_repros.py::ReproTests::test_graph_break_unsupported_fake SKIPPED [0.0004s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/156629 for platform(s) linux, rocm, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 3%] 2025-09-07T06:59:42.7885603Z dynamo/test_repros.py::ReproTests::test_guard_default_device PASSED [0.0532s] [ 3%] 2025-09-07T06:59:42.7885832Z dynamo/test_repros.py::ReproTests::test_guard_fail_nested_tuple PASSED [0.0171s] [ 4%] 2025-09-07T06:59:42.7886063Z dynamo/test_repros.py::ReproTests::test_guard_fail_tensor_bool PASSED [3.8552s] [ 4%] 2025-09-07T06:59:42.7886338Z dynamo/test_repros.py::ReproTests::test_guard_ordering_shape_fail PASSED [0.0011s] [ 4%] 2025-09-07T06:59:42.7886699Z dynamo/test_repros.py::ReproTests::test_guard_with_tuple_mutation PASSED [0.0328s] [ 5%] 2025-09-07T06:59:42.7886935Z dynamo/test_repros.py::ReproTests::test_hasattr_builtin PASSED [0.0285s] [ 5%] 2025-09-07T06:59:42.7887152Z dynamo/test_repros.py::ReproTests::test_hf_bigbird_unsqueeze PASSED [0.1609s] [ 6%] 2025-09-07T06:59:42.7887374Z dynamo/test_repros.py::ReproTests::test_hf_classinstantier PASSED [0.0598s] [ 6%] 2025-09-07T06:59:42.7887590Z dynamo/test_repros.py::ReproTests::test_hf_gelu_inline PASSED [0.0597s] [ 6%] 2025-09-07T06:59:42.7888799Z dynamo/test_repros.py::ReproTests::test_hf_model_output PASSED [0.0679s] [ 7%] 2025-09-07T06:59:42.7889009Z dynamo/test_repros.py::ReproTests::test_hf_t5_forward PASSED [0.4473s] [ 7%] 2025-09-07T06:59:42.7889225Z dynamo/test_repros.py::ReproTests::test_hf_xsoftmax_inference PASSED [0.0394s] [ 8%] 2025-09-07T06:59:42.7889451Z dynamo/test_repros.py::ReproTests::test_hf_xsoftmax_training PASSED [0.0596s] [ 8%] 2025-09-07T06:59:42.7889668Z dynamo/test_repros.py::ReproTests::test_iadd_graph_break PASSED [0.0198s] [ 8%] 2025-09-07T06:59:42.7889882Z dynamo/test_repros.py::ReproTests::test_incompatible_configs PASSED [0.0027s] [ 9%] 2025-09-07T06:59:42.7890099Z dynamo/test_repros.py::ReproTests::test_indexing_with_list PASSED [0.0253s] [ 9%] 2025-09-07T06:59:42.7890338Z dynamo/test_repros.py::ReproTests::test_inductor_dynamic_shapes_broadcasting PASSED [1.4217s] [ 10%] 2025-09-07T06:59:42.7890616Z dynamo/test_repros.py::ReproTests::test_inductor_no_recursionerror_on_for_loops PASSED [19.4282s] [ 10%] 2025-09-07T06:59:42.7890878Z dynamo/test_repros.py::ReproTests::test_inductor_rng_default_dtype PASSED [0.3510s] [ 10%] 2025-09-07T06:59:42.7891123Z dynamo/test_repros.py::ReproTests::test_inference_mode_dynamic_shapes PASSED [0.4819s] [ 11%] 2025-09-07T06:59:42.7892314Z dynamo/test_repros.py::ReproTests::test_inlining_cornercase PASSED [0.0347s] [ 11%] 2025-09-07T06:59:42.7892546Z dynamo/test_repros.py::ReproTests::test_inplace_unsqueeze_input PASSED [0.0341s] [ 12%] 2025-09-07T06:59:42.7892835Z dynamo/test_repros.py::ReproTests::test_int_format SKIPPED [0.0002s] (Fails with incorrect result with fullgraph constraints) [ 12%] 2025-09-07T06:59:42.7893134Z dynamo/test_repros.py::ReproTests::test_intermediate_leaf_requires_grad PASSED [0.0568s] [ 12%] 2025-09-07T06:59:42.7893370Z dynamo/test_repros.py::ReproTests::test_invalid_seq_unpack PASSED [0.0144s] [ 13%] 2025-09-07T06:59:42.7893639Z dynamo/test_repros.py::ReproTests::test_is_make_fx_tracing PASSED [0.0157s] [ 13%] 2025-09-07T06:59:42.7893857Z dynamo/test_repros.py::ReproTests::test_is_symbolic_tracing PASSED [0.0172s] [ 14%] 2025-09-07T06:59:42.7894075Z dynamo/test_repros.py::ReproTests::test_isinstance_dtype PASSED [0.0149s] [ 14%] 2025-09-07T06:59:42.7894288Z dynamo/test_repros.py::ReproTests::test_isinstance_storage PASSED [0.0355s] [ 14%] 2025-09-07T06:59:42.7894499Z dynamo/test_repros.py::ReproTests::test_issue111522 PASSED [0.0207s] [ 15%] 2025-09-07T06:59:42.7894706Z dynamo/test_repros.py::ReproTests::test_issue111918 PASSED [0.0412s] [ 15%] 2025-09-07T06:59:42.7894909Z dynamo/test_repros.py::ReproTests::test_issue114171 PASSED [0.4956s] [ 16%] 2025-09-07T06:59:42.7896029Z dynamo/test_repros.py::ReproTests::test_issue126128 PASSED [0.7997s] [ 16%] 2025-09-07T06:59:42.7896232Z dynamo/test_repros.py::ReproTests::test_issue134451 PASSED [0.0454s] [ 16%] 2025-09-07T06:59:42.7896459Z dynamo/test_repros.py::ReproTests::test_issue1466_size_aot_autograd PASSED [0.0758s] [ 17%] 2025-09-07T06:59:42.7896751Z dynamo/test_repros.py::ReproTests::test_issue175 PASSED [0.0999s] [ 17%] 2025-09-07T06:59:42.7896961Z dynamo/test_repros.py::ReproTests::test_jit_script_defaults PASSED [0.0289s] [ 18%] 2025-09-07T06:59:42.7897175Z dynamo/test_repros.py::ReproTests::test_jit_trace_errors PASSED [0.0015s] [ 18%] 2025-09-07T06:59:42.7897393Z dynamo/test_repros.py::ReproTests::test_kwargs_out_list_variable PASSED [0.0276s] [ 18%] 2025-09-07T06:59:42.7897662Z dynamo/test_repros.py::ReproTests::test_list_aliasing PASSED [0.0179s] [ 19%] 2025-09-07T06:59:42.7897867Z dynamo/test_repros.py::ReproTests::test_list_index PASSED [0.7242s] [ 19%] 2025-09-07T06:59:42.7898079Z dynamo/test_repros.py::ReproTests::test_list_index_not_found PASSED [0.0092s] [ 20%] 2025-09-07T06:59:42.7898314Z dynamo/test_repros.py::ReproTests::test_list_index_tensor_unsupported PASSED [0.0344s] [ 20%] 2025-09-07T06:59:42.7899831Z dynamo/test_repros.py::ReproTests::test_list_reverse PASSED [0.0210s] [ 20%] 2025-09-07T06:59:42.7900047Z dynamo/test_repros.py::ReproTests::test_list_self_reference PASSED [0.0057s] [ 21%] 2025-09-07T06:59:42.7900298Z dynamo/test_repros.py::ReproTests::test_listcomp SKIPPED [0.0002s] (Not supported in Python 3.12+) [ 21%] 2025-09-07T06:59:42.7900548Z dynamo/test_repros.py::ReproTests::test_longformer_chunk PASSED [0.6999s] [ 22%] 2025-09-07T06:59:42.7900758Z dynamo/test_repros.py::ReproTests::test_longtensor_list PASSED [0.1050s] [ 22%] 2025-09-07T06:59:42.7900971Z dynamo/test_repros.py::ReproTests::test_lru_cache_tracing XFAIL [0.0386s] [ 22%] 2025-09-07T06:59:42.7901179Z dynamo/test_repros.py::ReproTests::test_maml_item_capture XFAIL [0.0008s] [ 23%] 2025-09-07T06:59:42.7901394Z dynamo/test_repros.py::ReproTests::test_maml_no_item_capture PASSED [0.9671s] [ 23%] 2025-09-07T06:59:42.7901654Z dynamo/test_repros.py::ReproTests::test_many_overlapping_inputs_does_not_explode_guards PASSED [12.1608s] [ 24%] 2025-09-07T06:59:42.7901922Z dynamo/test_repros.py::ReproTests::test_many_views_with_mutation PASSED [0.0945s] [ 24%] 2025-09-07T06:59:42.7903033Z dynamo/test_repros.py::ReproTests::test_map_with_multiple_args PASSED [0.0384s] [ 24%] 2025-09-07T06:59:42.7903264Z dynamo/test_repros.py::ReproTests::test_maybe_multiply_symint PASSED [0.4527s] [ 25%] 2025-09-07T06:59:42.7903478Z dynamo/test_repros.py::ReproTests::test_mem_leak_guards PASSED [0.1149s] [ 25%] 2025-09-07T06:59:42.7903709Z dynamo/test_repros.py::ReproTests::test_merge_criteria_processor_list1 PASSED [0.0477s] [ 26%] 2025-09-07T06:59:42.7903961Z dynamo/test_repros.py::ReproTests::test_merge_criteria_processor_list2 PASSED [0.0416s] [ 26%] 2025-09-07T06:59:42.7904192Z dynamo/test_repros.py::ReproTests::test_method_overriding PASSED [0.0195s] [ 26%] 2025-09-07T06:59:42.7904408Z dynamo/test_repros.py::ReproTests::test_module_in_skipfiles PASSED [0.0210s] [ 27%] 2025-09-07T06:59:42.7904619Z dynamo/test_repros.py::ReproTests::test_modules PASSED [0.0290s] [ 27%] 2025-09-07T06:59:42.7904872Z dynamo/test_repros.py::ReproTests::test_multi_dot_import PASSED [0.0450s] [ 28%] 2025-09-07T06:59:42.7905111Z dynamo/test_repros.py::ReproTests::test_multi_import SKIPPED [0.0010s] (requires detectron2) [ 28%] 2025-09-07T06:59:42.7905347Z dynamo/test_repros.py::ReproTests::test_named_buffers PASSED [0.0271s] [ 28%] 2025-09-07T06:59:42.7906422Z dynamo/test_repros.py::ReproTests::test_nanmean_out PASSED [0.0173s] [ 29%] 2025-09-07T06:59:42.7906712Z dynamo/test_repros.py::ReproTests::test_negative_floor_div_solve PASSED [0.0395s] [ 29%] 2025-09-07T06:59:42.7906938Z dynamo/test_repros.py::ReproTests::test_negative_shape_guard PASSED [0.0750s] [ 30%] 2025-09-07T06:59:42.7907170Z dynamo/test_repros.py::ReproTests::test_nested_while_loop_graph_break PASSED [0.0405s] [ 30%] 2025-09-07T06:59:42.7907401Z dynamo/test_repros.py::ReproTests::test_nn_module_callable PASSED [0.0121s] [ 30%] 2025-09-07T06:59:42.7907627Z dynamo/test_repros.py::ReproTests::test_nn_module_property_closure PASSED [0.0210s] [ 31%] 2025-09-07T06:59:42.7907858Z dynamo/test_repros.py::ReproTests::test_nn_module_stack_bc PASSED [0.0281s] [ 31%] 2025-09-07T06:59:42.7908179Z dynamo/test_repros.py::ReproTests::test_nn_param_freevar_codegen [W907 06:59:23.685940262 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T06:59:42.7908515Z [W907 06:59:23.733915659 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T06:59:42.7908825Z [W907 06:59:23.772751022 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T06:59:42.7909001Z PASSED [0.1114s] [ 32%] 2025-09-07T06:59:42.7910026Z dynamo/test_repros.py::ReproTests::test_nn_parameter PASSED [0.0455s] [ 32%] 2025-09-07T06:59:42.7910263Z dynamo/test_repros.py::ReproTests::test_nn_parameter_ctor_graph_breaks PASSED [0.0339s] [ 32%] 2025-09-07T06:59:42.7910493Z dynamo/test_repros.py::ReproTests::test_nn_parametrize PASSED [0.0437s] [ 33%] 2025-09-07T06:59:42.7910705Z dynamo/test_repros.py::ReproTests::test_no_grad_inline PASSED [0.0181s] [ 33%] 2025-09-07T06:59:42.7910926Z dynamo/test_repros.py::ReproTests::test_no_tracing_into_eval_frame PASSED [0.0151s] [ 34%] 2025-09-07T06:59:42.7911178Z dynamo/test_repros.py::ReproTests::test_no_tracing_into_eval_frame_ctx_manager PASSED [0.0146s] [ 34%] 2025-09-07T06:59:42.7911422Z dynamo/test_repros.py::ReproTests::test_nonconst_issubclass PASSED [0.0206s] [ 34%] 2025-09-07T06:59:42.7911663Z dynamo/test_repros.py::ReproTests::test_not_rewrite_assert_for_other_errors PASSED [0.0185s] [ 35%] 2025-09-07T06:59:42.7911906Z dynamo/test_repros.py::ReproTests::test_nullcontext1 PASSED [0.0155s] [ 35%] 2025-09-07T06:59:42.7912114Z dynamo/test_repros.py::ReproTests::test_nullcontext2 PASSED [0.0157s] [ 36%] 2025-09-07T06:59:42.7912337Z dynamo/test_repros.py::ReproTests::test_numpy_not_ndarray_recompiles PASSED [0.0657s] [ 36%] 2025-09-07T06:59:42.7913421Z dynamo/test_repros.py::ReproTests::test_numpy_tobytes_no_error PASSED [0.0356s] [ 36%] 2025-09-07T06:59:42.7913659Z dynamo/test_repros.py::ReproTests::test_odict_get_item_index_name PASSED [0.0175s] [ 37%] 2025-09-07T06:59:42.7913928Z dynamo/test_repros.py::ReproTests::test_omegaconf_dictconfig SKIPPED [0.0002s] (missing omegaconf package) [ 37%] 2025-09-07T06:59:42.7914244Z dynamo/test_repros.py::ReproTests::test_omegaconf_listconfig_contains SKIPPED [0.0001s] (missing omegaconf package) [ 38%] 2025-09-07T06:59:42.7914526Z dynamo/test_repros.py::ReproTests::test_omegaconf_listconfig_iter PASSED [0.0246s] [ 38%] 2025-09-07T06:59:42.7914752Z dynamo/test_repros.py::ReproTests::test_ones_out_dynamic PASSED [1.1098s] [ 38%] 2025-09-07T06:59:42.7914983Z dynamo/test_repros.py::ReproTests::test_optim_state_references_cleared PASSED [0.1533s] [ 39%] 2025-09-07T06:59:42.7915331Z dynamo/test_repros.py::ReproTests::test_optimized_deepcopy W0907 06:59:24.881000 252905 site-packages/torch/_dynamo/debug_utils.py:429] Could not generate fp64 outputs 2025-09-07T06:59:42.7915605Z PASSED [0.0296s] [ 39%] 2025-09-07T06:59:42.7915817Z dynamo/test_repros.py::ReproTests::test_optimized_module_patched_init PASSED [0.3240s] [ 40%] 2025-09-07T06:59:42.7916059Z dynamo/test_repros.py::ReproTests::test_optimized_module_training PASSED [0.0015s] [ 40%] 2025-09-07T06:59:42.7917207Z dynamo/test_repros.py::ReproTests::test_os_fspath PASSED [0.0187s] [ 40%] 2025-09-07T06:59:42.7917433Z dynamo/test_repros.py::ReproTests::test_out_nested_cell_shape_change PASSED [0.0431s] [ 41%] 2025-09-07T06:59:42.7917771Z dynamo/test_repros.py::ReproTests::test_out_nested_cell_tuple_shape_change PASSED [0.0437s] [ 41%] 2025-09-07T06:59:42.7918003Z dynamo/test_repros.py::ReproTests::test_out_none PASSED [0.0191s] [ 42%] 2025-09-07T06:59:42.7918228Z dynamo/test_repros.py::ReproTests::test_out_overload_non_contiguous PASSED [0.0325s] [ 42%] 2025-09-07T06:59:42.7918466Z dynamo/test_repros.py::ReproTests::test_out_root_cell_shape_change PASSED [0.0406s] [ 42%] 2025-09-07T06:59:42.7918708Z dynamo/test_repros.py::ReproTests::test_out_root_cell_tuple_shape_change PASSED [0.0432s] [ 43%] 2025-09-07T06:59:42.7918959Z dynamo/test_repros.py::ReproTests::test_output_aliases_intermediate PASSED [0.0528s] [ 43%] 2025-09-07T06:59:42.7919224Z dynamo/test_repros.py::ReproTests::test_overlapping_inputs_with_dynamic_shapes_error PASSED [0.0009s] [ 44%] 2025-09-07T06:59:42.7919480Z dynamo/test_repros.py::ReproTests::test_overwriting_params PASSED [0.0312s] [ 44%] 2025-09-07T06:59:42.7919725Z dynamo/test_repros.py::ReproTests::test_partially_initialized_module_property PASSED [0.0244s] [ 44%] 2025-09-07T06:59:42.7920963Z dynamo/test_repros.py::ReproTests::test_partitioner_activation_memory_budget_with_unbacked_symints PASSED [0.2530s] [ 45%] 2025-09-07T06:59:42.7921272Z dynamo/test_repros.py::ReproTests::test_partitioner_cse_respects_mutation_boundaries PASSED [0.0010s] [ 45%] 2025-09-07T06:59:42.7921532Z dynamo/test_repros.py::ReproTests::test_pointless_graph_removal PASSED [0.0403s] [ 46%] 2025-09-07T06:59:42.7921754Z dynamo/test_repros.py::ReproTests::test_primtorch PASSED [0.0277s] [ 46%] 2025-09-07T06:59:42.7921973Z dynamo/test_repros.py::ReproTests::test_primtorch_no_graph_break XFAIL [0.0088s] [ 46%] 2025-09-07T06:59:42.7922199Z dynamo/test_repros.py::ReproTests::test_randint_out_dynamic PASSED [1.1421s] [ 47%] 2025-09-07T06:59:42.7922413Z dynamo/test_repros.py::ReproTests::test_recursive_map PASSED [0.0240s] [ 47%] 2025-09-07T06:59:42.7922619Z dynamo/test_repros.py::ReproTests::test_reformer_eval PASSED [0.0938s] [ 48%] 2025-09-07T06:59:42.7922839Z dynamo/test_repros.py::ReproTests::test_reformer_min_chunk_len PASSED [0.0244s] [ 48%] 2025-09-07T06:59:42.7923057Z dynamo/test_repros.py::ReproTests::test_reformer_sorting PASSED [0.3469s] [ 48%] 2025-09-07T06:59:42.7923264Z dynamo/test_repros.py::ReproTests::test_reformer_train PASSED [0.0883s] [ 49%] 2025-09-07T06:59:42.7924423Z dynamo/test_repros.py::ReproTests::test_reinplacing W0907 06:59:27.830000 252905 site-packages/torch/_dynamo/debug_utils.py:429] Could not generate fp64 outputs 2025-09-07T06:59:42.7924696Z PASSED [0.4197s] [ 49%] 2025-09-07T06:59:42.7925244Z dynamo/test_repros.py::ReproTests::test_relative_import SKIPPED [0.0006s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/156679 for platform(s) linux, rocm, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 50%] 2025-09-07T06:59:42.7926241Z dynamo/test_repros.py::ReproTests::test_relative_import_no_modulename SKIPPED [0.0004s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/156691 for platform(s) linux, rocm, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 50%] 2025-09-07T06:59:42.7927014Z dynamo/test_repros.py::ReproTests::test_requires_grad_guards_with_grad_mode1 PASSED [0.6443s] [ 50%] 2025-09-07T06:59:42.7927334Z dynamo/test_repros.py::ReproTests::test_requires_grad_guards_with_grad_mode2 PASSED [0.4354s] [ 51%] 2025-09-07T06:59:42.7927585Z dynamo/test_repros.py::ReproTests::test_restricted_list_subclass1 PASSED [0.0181s] [ 51%] 2025-09-07T06:59:42.7927821Z dynamo/test_repros.py::ReproTests::test_restricted_list_subclass2 PASSED [0.0131s] [ 52%] 2025-09-07T06:59:42.7928055Z dynamo/test_repros.py::ReproTests::test_restricted_list_subclass3 PASSED [0.0129s] [ 52%] 2025-09-07T06:59:42.7928304Z dynamo/test_repros.py::ReproTests::test_return_value_duplication_mixed_grad PASSED [0.0253s] [ 52%] 2025-09-07T06:59:42.7929496Z dynamo/test_repros.py::ReproTests::test_return_value_duplication_scalar PASSED [0.0304s] [ 53%] 2025-09-07T06:59:42.7929758Z dynamo/test_repros.py::ReproTests::test_return_value_duplication_tensor PASSED [0.0250s] [ 53%] 2025-09-07T06:59:42.7929990Z dynamo/test_repros.py::ReproTests::test_return_weakref PASSED [0.0111s] [ 54%] 2025-09-07T06:59:42.7930230Z dynamo/test_repros.py::ReproTests::test_rewrite_assert_dont_change_bytecode PASSED [0.0244s] [ 54%] 2025-09-07T06:59:42.7930472Z dynamo/test_repros.py::ReproTests::test_rewrite_assert_noop PASSED [0.4557s] [ 54%] 2025-09-07T06:59:42.7930697Z dynamo/test_repros.py::ReproTests::test_rewrite_assert_with_msg PASSED [0.0291s] [ 55%] 2025-09-07T06:59:42.7930944Z dynamo/test_repros.py::ReproTests::test_rewrite_assert_with_non_string_msg PASSED [0.0213s] [ 55%] 2025-09-07T06:59:42.7931192Z dynamo/test_repros.py::ReproTests::test_rewrite_assert_without_msg PASSED [0.0206s] [ 56%] 2025-09-07T06:59:42.7931477Z dynamo/test_repros.py::ReproTests::test_rng_state PASSED [0.0320s] [ 56%] 2025-09-07T06:59:42.7931685Z dynamo/test_repros.py::ReproTests::test_seq_append_list PASSED [0.0267s] [ 56%] 2025-09-07T06:59:42.7931919Z dynamo/test_repros.py::ReproTests::test_setattr_requires_grad_graph_breaks PASSED [0.0789s] [ 57%] 2025-09-07T06:59:42.7933143Z dynamo/test_repros.py::ReproTests::test_setitem_boolean_mask_diff W0907 06:59:30.029000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] Backend compiler exception 2025-09-07T06:59:42.7933581Z W0907 06:59:30.029000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] Explanation: Backend compiler `aot_eager` failed with aten.nonzero.default. Adding a graph break. 2025-09-07T06:59:42.7933952Z W0907 06:59:30.029000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] Hint: Report an issue to the backend compiler repo. 2025-09-07T06:59:42.7934201Z W0907 06:59:30.029000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] 2025-09-07T06:59:42.7934446Z W0907 06:59:30.029000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] Developer debug context: Backend: aot_eager 2025-09-07T06:59:42.7934727Z W0907 06:59:30.029000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] Exception:aten.nonzero.default 2025-09-07T06:59:42.7934975Z W0907 06:59:30.029000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] Traceback: 2025-09-07T06:59:42.7935288Z W0907 06:59:30.029000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] File "/var/lib/jenkins/pytorch/test/dynamo/test_repros.py", line 2139, in fn 2025-09-07T06:59:42.7935594Z W0907 06:59:30.029000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] return x 2025-09-07T06:59:42.7935803Z W0907 06:59:30.029000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] 2025-09-07T06:59:42.7935985Z W0907 06:59:30.029000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] 2025-09-07T06:59:42.7937275Z W0907 06:59:30.029000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0219.html 2025-09-07T06:59:42.7937665Z W0907 06:59:30.044000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] Backend compiler exception 2025-09-07T06:59:42.7938010Z W0907 06:59:30.044000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] Explanation: Backend compiler `aot_eager` failed with aten.nonzero.default. Adding a graph break. 2025-09-07T06:59:42.7938439Z W0907 06:59:30.044000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] Hint: Report an issue to the backend compiler repo. 2025-09-07T06:59:42.7938687Z W0907 06:59:30.044000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] 2025-09-07T06:59:42.7938927Z W0907 06:59:30.044000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] Developer debug context: Backend: aot_eager 2025-09-07T06:59:42.7939208Z W0907 06:59:30.044000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] Exception:aten.nonzero.default 2025-09-07T06:59:42.7939461Z W0907 06:59:30.044000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] Traceback: 2025-09-07T06:59:42.7939770Z W0907 06:59:30.044000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] File "/var/lib/jenkins/pytorch/test/dynamo/test_repros.py", line 2139, in fn 2025-09-07T06:59:42.7940079Z W0907 06:59:30.044000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] return x 2025-09-07T06:59:42.7940289Z W0907 06:59:30.044000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] 2025-09-07T06:59:42.7941344Z W0907 06:59:30.044000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] 2025-09-07T06:59:42.7941694Z W0907 06:59:30.044000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0219.html 2025-09-07T06:59:42.7942004Z PASSED [0.0407s] [ 57%] 2025-09-07T06:59:42.7942216Z dynamo/test_repros.py::ReproTests::test_setitem_tensor_prop PASSED [0.0156s] [ 58%] 2025-09-07T06:59:42.7942552Z dynamo/test_repros.py::ReproTests::test_setitem_tuple_boolean_mask_diff W0907 06:59:30.086000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] Backend compiler exception 2025-09-07T06:59:42.7942986Z W0907 06:59:30.086000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] Explanation: Backend compiler `aot_eager` failed with aten.nonzero.default. Adding a graph break. 2025-09-07T06:59:42.7943355Z W0907 06:59:30.086000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] Hint: Report an issue to the backend compiler repo. 2025-09-07T06:59:42.7943600Z W0907 06:59:30.086000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] 2025-09-07T06:59:42.7943837Z W0907 06:59:30.086000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] Developer debug context: Backend: aot_eager 2025-09-07T06:59:42.7944116Z W0907 06:59:30.086000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] Exception:aten.nonzero.default 2025-09-07T06:59:42.7944364Z W0907 06:59:30.086000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] Traceback: 2025-09-07T06:59:42.7945532Z W0907 06:59:30.086000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] File "/var/lib/jenkins/pytorch/test/dynamo/test_repros.py", line 2151, in fn 2025-09-07T06:59:42.7945837Z W0907 06:59:30.086000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] return x 2025-09-07T06:59:42.7946043Z W0907 06:59:30.086000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] 2025-09-07T06:59:42.7946228Z W0907 06:59:30.086000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] 2025-09-07T06:59:42.7946641Z W0907 06:59:30.086000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0] For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0219.html 2025-09-07T06:59:42.7947020Z W0907 06:59:30.101000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] Backend compiler exception 2025-09-07T06:59:42.7947367Z W0907 06:59:30.101000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] Explanation: Backend compiler `aot_eager` failed with aten.nonzero.default. Adding a graph break. 2025-09-07T06:59:42.7947734Z W0907 06:59:30.101000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] Hint: Report an issue to the backend compiler repo. 2025-09-07T06:59:42.7947982Z W0907 06:59:30.101000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] 2025-09-07T06:59:42.7948274Z W0907 06:59:30.101000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] Developer debug context: Backend: aot_eager 2025-09-07T06:59:42.7949451Z W0907 06:59:30.101000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] Exception:aten.nonzero.default 2025-09-07T06:59:42.7949706Z W0907 06:59:30.101000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] Traceback: 2025-09-07T06:59:42.7950013Z W0907 06:59:30.101000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] File "/var/lib/jenkins/pytorch/test/dynamo/test_repros.py", line 2151, in fn 2025-09-07T06:59:42.7950325Z W0907 06:59:30.101000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] return x 2025-09-07T06:59:42.7950532Z W0907 06:59:30.101000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] 2025-09-07T06:59:42.7950716Z W0907 06:59:30.101000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] 2025-09-07T06:59:42.7951067Z W0907 06:59:30.101000 252905 site-packages/torch/_dynamo/exc.py:593] [0/0_1] For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0219.html 2025-09-07T06:59:42.7951376Z PASSED [0.0394s] [ 58%] 2025-09-07T06:59:42.7951527Z dynamo/test_repros.py::ReproTests::test_sigmoid_out PASSED [0.0408s] [ 58%] 2025-09-07T06:59:42.7951737Z dynamo/test_repros.py::ReproTests::test_sigmoid_out2 PASSED [0.0118s] [ 59%] 2025-09-07T06:59:42.7951944Z dynamo/test_repros.py::ReproTests::test_size_typematch PASSED [0.0170s] [ 59%] 2025-09-07T06:59:42.7953072Z dynamo/test_repros.py::ReproTests::test_slice_into_list_mutable PASSED [0.0162s] [ 60%] 2025-09-07T06:59:42.7953306Z dynamo/test_repros.py::ReproTests::test_slicing_dynamic_shape PASSED [0.0214s] [ 60%] 2025-09-07T06:59:42.7953542Z dynamo/test_repros.py::ReproTests::test_slicing_dynamic_shape_setitem PASSED [0.0316s] [ 60%] 2025-09-07T06:59:42.7953768Z dynamo/test_repros.py::ReproTests::test_sort_out PASSED [0.0454s] [ 61%] 2025-09-07T06:59:42.7953971Z dynamo/test_repros.py::ReproTests::test_sort_out2 PASSED [0.0130s] [ 61%] 2025-09-07T06:59:42.7954185Z dynamo/test_repros.py::ReproTests::test_specialized_stride PASSED [0.0087s] [ 62%] 2025-09-07T06:59:42.7954417Z dynamo/test_repros.py::ReproTests::test_split_with_sizes_aot_autograd PASSED [0.0858s] [ 62%] 2025-09-07T06:59:42.7954659Z dynamo/test_repros.py::ReproTests::test_staticmethod_allow_in_graph PASSED [0.0110s] [ 62%] 2025-09-07T06:59:42.7954892Z dynamo/test_repros.py::ReproTests::test_stk_sdd_is_transposed PASSED [0.0183s] [ 63%] 2025-09-07T06:59:42.7955125Z dynamo/test_repros.py::ReproTests::test_stop_iteration_reconstruct PASSED [0.0097s] [ 63%] 2025-09-07T06:59:42.7955345Z dynamo/test_repros.py::ReproTests::test_str_isalnum PASSED [0.0091s] [ 64%] 2025-09-07T06:59:42.7956403Z dynamo/test_repros.py::ReproTests::test_string_format PASSED [0.0100s] [ 64%] 2025-09-07T06:59:42.7956692Z dynamo/test_repros.py::ReproTests::test_subclass_graph_output_repro PASSED [0.0244s] [ 64%] 2025-09-07T06:59:42.7956920Z dynamo/test_repros.py::ReproTests::test_super_classmethod PASSED [0.0128s] [ 65%] 2025-09-07T06:59:42.7957151Z dynamo/test_repros.py::ReproTests::test_super_classmethod_inheritance PASSED [0.0115s] [ 65%] 2025-09-07T06:59:42.7957377Z dynamo/test_repros.py::ReproTests::test_super_diamond PASSED [0.0109s] [ 66%] 2025-09-07T06:59:42.7957948Z dynamo/test_repros.py::ReproTests::test_super_in_staticmethod W0907 06:59:30.537000 252905 site-packages/torch/_dynamo/variables/builtin.py:1043] [0/0] incorrect arg count missing a required argument: 'a' and no constant handler 2025-09-07T06:59:42.7958388Z PASSED [0.0098s] [ 66%] 2025-09-07T06:59:42.7958540Z dynamo/test_repros.py::ReproTests::test_super_staticmethod PASSED [0.0101s] [ 66%] 2025-09-07T06:59:42.7958866Z dynamo/test_repros.py::ReproTests::test_swin_base_tensor_attr W0907 06:59:30.557000 252905 site-packages/torch/_dynamo/debug_utils.py:429] Could not generate fp64 outputs 2025-09-07T06:59:42.7959138Z PASSED [0.4438s] [ 67%] 2025-09-07T06:59:42.7959333Z dynamo/test_repros.py::ReproTests::test_symint_bitwise PASSED [0.4217s] [ 67%] 2025-09-07T06:59:42.7960431Z dynamo/test_repros.py::ReproTests::test_symnode_is_not_op PASSED [0.0179s] [ 68%] 2025-09-07T06:59:42.7960645Z dynamo/test_repros.py::ReproTests::test_symnode_is_op PASSED [0.0166s] [ 68%] 2025-09-07T06:59:42.7960851Z dynamo/test_repros.py::ReproTests::test_sys_monitoring PASSED [0.0131s] [ 68%] 2025-09-07T06:59:42.7961059Z dynamo/test_repros.py::ReproTests::test_tensor_data_kwarg PASSED [0.0100s] [ 69%] 2025-09-07T06:59:42.7961283Z dynamo/test_repros.py::ReproTests::test_tensor_isinstance_tuple PASSED [0.0095s] [ 69%] 2025-09-07T06:59:42.7961501Z dynamo/test_repros.py::ReproTests::test_tensor_item PASSED [0.4276s] [ 70%] 2025-09-07T06:59:42.7961704Z dynamo/test_repros.py::ReproTests::test_tensor_random PASSED [0.0368s] [ 70%] 2025-09-07T06:59:42.7961951Z dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_aot_eager_func_name_func1 PASSED [0.0007s] [ 70%] 2025-09-07T06:59:42.7962249Z dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_aot_eager_func_name_func2 PASSED [0.0239s] [ 71%] 2025-09-07T06:59:42.7962544Z dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_aot_eager_func_name_func3 PASSED [0.0231s] [ 71%] 2025-09-07T06:59:42.7962831Z dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_eager_func_name_func1 PASSED [0.0119s] [ 72%] 2025-09-07T06:59:42.7963961Z dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_eager_func_name_func2 PASSED [0.0140s] [ 72%] 2025-09-07T06:59:42.7964290Z dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_eager_func_name_func3 PASSED [0.0149s] [ 72%] 2025-09-07T06:59:42.7964574Z dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_inductor_func_name_func1 PASSED [0.0007s] [ 73%] 2025-09-07T06:59:42.7964865Z dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_inductor_func_name_func2 PASSED [0.0493s] [ 73%] 2025-09-07T06:59:42.7965154Z dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_inductor_func_name_func3 PASSED [0.3190s] [ 74%] 2025-09-07T06:59:42.7965426Z dynamo/test_repros.py::ReproTests::test_tensor_set_data_mismatched_dtype PASSED [0.0158s] [ 74%] 2025-09-07T06:59:42.7965658Z dynamo/test_repros.py::ReproTests::test_tensor_split PASSED [0.4952s] [ 74%] 2025-09-07T06:59:42.7965885Z dynamo/test_repros.py::ReproTests::test_tensor_split_within_device_cm PASSED [0.0667s] [ 75%] 2025-09-07T06:59:42.7966115Z dynamo/test_repros.py::ReproTests::test_tensor_uniform PASSED [0.3819s] [ 75%] 2025-09-07T06:59:42.7966320Z dynamo/test_repros.py::ReproTests::test_threading_local PASSED [0.0113s] [ 76%] 2025-09-07T06:59:42.7966635Z dynamo/test_repros.py::ReproTests::test_tokenization PASSED [0.0209s] [ 76%] 2025-09-07T06:59:42.7967867Z dynamo/test_repros.py::ReproTests::test_torch_compile_in_compile_frame PASSED [0.0208s] [ 76%] 2025-09-07T06:59:42.7968100Z dynamo/test_repros.py::ReproTests::test_torch_ops_aten PASSED [0.0110s] [ 77%] 2025-09-07T06:59:42.7968311Z dynamo/test_repros.py::ReproTests::test_torch_tensor_ops PASSED [0.0097s] [ 77%] 2025-09-07T06:59:42.7968543Z dynamo/test_repros.py::ReproTests::test_torch_tensor_ops_no_graph_break PASSED [0.0088s] [ 78%] 2025-09-07T06:59:42.7968782Z dynamo/test_repros.py::ReproTests::test_torch_variable_type PASSED [0.0043s] [ 78%] 2025-09-07T06:59:42.7968993Z dynamo/test_repros.py::ReproTests::test_torchname PASSED [0.0079s] [ 78%] 2025-09-07T06:59:42.7969216Z dynamo/test_repros.py::ReproTests::test_trace_functional_tensor_with PASSED [0.0723s] [ 79%] 2025-09-07T06:59:42.7969457Z dynamo/test_repros.py::ReproTests::test_tuple_enum_as_key_dict PASSED [0.0620s] [ 79%] 2025-09-07T06:59:42.7969675Z dynamo/test_repros.py::ReproTests::test_typed_dict PASSED [0.0121s] [ 80%] 2025-09-07T06:59:42.7969883Z dynamo/test_repros.py::ReproTests::test_typed_dict_total PASSED [0.0098s] [ 80%] 2025-09-07T06:59:42.7970106Z dynamo/test_repros.py::ReproTests::test_udf_classes_reconstruction PASSED [0.0189s] [ 80%] 2025-09-07T06:59:42.7971249Z dynamo/test_repros.py::ReproTests::test_unbacked_arange_in_bounds PASSED [0.0355s] [ 81%] 2025-09-07T06:59:42.7971473Z dynamo/test_repros.py::ReproTests::test_unbind_copy_out PASSED [0.0123s] [ 81%] 2025-09-07T06:59:42.7971696Z dynamo/test_repros.py::ReproTests::test_unpack_hooks_can_be_disabled PASSED [0.0263s] [ 82%] 2025-09-07T06:59:42.7971950Z dynamo/test_repros.py::ReproTests::test_unpack_hooks_dont_run_during_tracing PASSED [0.0243s] [ 82%] 2025-09-07T06:59:42.7972242Z dynamo/test_repros.py::ReproTests::test_unspecialized_nn_module_with_torch_variable_attribute PASSED [0.0156s] [ 82%] 2025-09-07T06:59:42.7972509Z dynamo/test_repros.py::ReproTests::test_unsqueeze_mul_strides PASSED [0.4218s] [ 83%] 2025-09-07T06:59:42.7972731Z dynamo/test_repros.py::ReproTests::test_user_ctor_ctx_manager PASSED [0.0117s] [ 83%] 2025-09-07T06:59:42.7972969Z dynamo/test_repros.py::ReproTests::test_user_ctor_ctx_manager_custom_init PASSED [0.0109s] [ 84%] 2025-09-07T06:59:42.7973242Z dynamo/test_repros.py::ReproTests::test_user_ctor_ctx_manager_custom_init_graph_break PASSED [0.0225s] [ 84%] 2025-09-07T06:59:42.7973493Z dynamo/test_repros.py::ReproTests::test_user_defined_iter PASSED [0.0129s] [ 84%] 2025-09-07T06:59:42.7990371Z dynamo/test_repros.py::ReproTests::test_user_defined_object_callable PASSED [0.0095s] [ 85%] 2025-09-07T06:59:42.7990735Z dynamo/test_repros.py::ReproTests::test_validate_model_kwargs PASSED [0.0341s] [ 85%] 2025-09-07T06:59:42.7991067Z dynamo/test_repros.py::ReproTests::test_vc_bumped_in_inference_graph PASSED [0.3405s] [ 86%] 2025-09-07T06:59:42.7991440Z dynamo/test_repros.py::ReproTests::test_vdd_duplicate_error PASSED [0.0161s] [ 86%] 2025-09-07T06:59:42.7991707Z dynamo/test_repros.py::ReproTests::test_view_dtype_overload PASSED [0.0335s] [ 86%] 2025-09-07T06:59:42.7991980Z dynamo/test_repros.py::ReproTests::test_weakref PASSED [0.0107s] [ 87%] 2025-09-07T06:59:42.7992250Z dynamo/test_repros.py::ReproTests::test_weakref_callback PASSED [0.0445s] [ 87%] 2025-09-07T06:59:42.7992520Z dynamo/test_repros.py::ReproTests::test_weakref_construction PASSED [0.0102s] [ 88%] 2025-09-07T06:59:42.7992778Z dynamo/test_repros.py::ReproTests::test_weakref_del PASSED [0.4352s] [ 88%] 2025-09-07T06:59:42.7993031Z dynamo/test_repros.py::ReproTests::test_weakref_proxy PASSED [0.0104s] [ 88%] 2025-09-07T06:59:42.7993291Z dynamo/test_repros.py::ReproTests::test_weakref_reconstruct PASSED [0.0267s] [ 89%] 2025-09-07T06:59:42.7995581Z dynamo/test_repros.py::ReproTests::test_while_loop_graph_break PASSED [0.0150s] [ 89%] 2025-09-07T06:59:42.7995858Z dynamo/test_repros.py::ReproTests::test_while_loop_graph_break_inside_call_function PASSED [0.0254s] [ 90%] 2025-09-07T06:59:42.7996127Z dynamo/test_repros.py::ReproTests::test_with_on_graph_break_inst PASSED [0.0670s] [ 90%] 2025-09-07T06:59:42.7996364Z dynamo/test_repros.py::ReproTests::test_with_on_graph_break_nested PASSED [0.0723s] [ 90%] 2025-09-07T06:59:42.7996668Z dynamo/test_repros.py::ReproTests::test_zeros_out_dynamic PASSED [0.6979s] [ 91%] 2025-09-07T06:59:42.7996899Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_cuda_sync_cuda PASSED [0.0331s] [ 91%] 2025-09-07T06:59:42.7997173Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_data_dependent_error_log_no_print_cuda PASSED [0.0133s] [ 92%] 2025-09-07T06:59:42.7997557Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_deepcopy_constant_tensor_in_aot_bwd_cuda PASSED [0.0499s] [ 92%] 2025-09-07T06:59:42.7997852Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_filter_safe_grad_warning_cuda PASSED [0.0008s] [ 92%] 2025-09-07T06:59:42.7998138Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_filter_user_warnings_cuda PASSED [0.0006s] [ 93%] 2025-09-07T06:59:42.7998408Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_filter_warnings_cuda PASSED [0.0617s] [ 93%] 2025-09-07T06:59:42.8000140Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_flash_attn_backward_mixed_strides_cuda SKIPPED [0.0002s] (flash attention not supported) [ 94%] 2025-09-07T06:59:42.8000543Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_getattr_return_cuda PASSED [0.0147s] [ 94%] 2025-09-07T06:59:42.8000808Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_guard_default_device_cuda PASSED [0.0253s] [ 94%] 2025-09-07T06:59:42.8001102Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_megablocks_moe_cuda SKIPPED [0.0009s] (requires megablocks) [ 95%] 2025-09-07T06:59:42.8001422Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_memleak_when_graph_input_has_tensor_attr_cuda PASSED [0.0273s] [ 95%] 2025-09-07T06:59:42.8001724Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_module_attribute_error_cuda PASSED [0.0179s] [ 96%] 2025-09-07T06:59:42.8001993Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_named_tuple_vt_clone_cuda PASSED [0.0216s] [ 96%] 2025-09-07T06:59:42.8002251Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_norm_dtype_cuda PASSED [0.0396s] [ 96%] 2025-09-07T06:59:42.8002529Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_partitioner_saves_weights_for_bw_cuda PASSED [2.9881s] [ 97%] 2025-09-07T06:59:42.8002821Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_sdpa_dynamic_shapes_cuda PASSED [0.7040s] [ 97%] 2025-09-07T06:59:42.8003091Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_sub_alpha_scalar_repro_cuda PASSED [0.0200s] [ 98%] 2025-09-07T06:59:42.8004293Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_tensor_size_hasattr_cuda PASSED [0.0115s] [ 98%] 2025-09-07T06:59:42.8004572Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_torch_cuda_is_initialized_cuda PASSED [0.0191s] [ 98%] 2025-09-07T06:59:42.8004916Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_truthiness_of_symints_no_recompiles_cuda PASSED [0.0145s] [ 99%] 2025-09-07T06:59:42.8005202Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_udf_class_source_cuda PASSED [0.0116s] [ 99%] 2025-09-07T06:59:42.8005482Z dynamo/test_repros.py::ReproTestsDeviceCUDA::test_zero_dim_param_mixed_device_grad_cuda PASSED [0.0350s] [100%] 2025-09-07T06:59:42.8005652Z 2025-09-07T06:59:42.8005860Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/dynamo.test_repros/dynamo.test_repros-ca4656eacb8dd0c3.xml - 2025-09-07T06:59:42.8006172Z ===== 237 passed, 10 skipped, 83 deselected, 3 xfailed in 62.94s (0:01:02) ===== 2025-09-07T06:59:42.8006460Z The following tests failed and then succeeded when run in a new process['test/dynamo/test_repros.py::ReproTests::test_get_parameter_dtype'] 2025-09-07T06:59:42.8006832Z 2025-09-07T06:59:42.8006979Z FINISHED PRINTING LOG FILE of dynamo/test_repros 1/1 (test/test-reports/dynamo.test_repros_1.1_b10c530c279eac19_.log) 2025-09-07T06:59:42.8007159Z 2025-09-07T06:59:42.8007239Z Running export/test_draft_export 1/1 ... [2025-09-07 06:59:42.761372] 2025-09-07T06:59:42.8008373Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:59:42.8008762Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'export/test_draft_export.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:59:42.761619] 2025-09-07T06:59:57.9585443Z 2025-09-07T06:59:57.9586858Z export/test_draft_export 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_draft_export_1.1_e34fe6fb70cc58c7_.log 2025-09-07T06:59:57.9594583Z Running 21 items in this shard: test/export/test_draft_export.py::TestDraftExport::test_complex_data_dependent_expr, test/export/test_draft_export.py::TestDraftExport::test_constantify_unbacked_symbol, test/export/test_draft_export.py::TestDraftExport::test_cuda_memory_usage, test/export/test_draft_export.py::TestDraftExport::test_data_dependent_failure, test/export/test_draft_export.py::TestDraftExport::test_dedup_data_dependent_failure, test/export/test_draft_export.py::TestDraftExport::test_fake_infer_dense_in_memory_check, test/export/test_draft_export.py::TestDraftExport::test_masked_linear, test/export/test_draft_export.py::TestDraftExport::test_missing_meta_kernel_custom_op_basic, test/export/test_draft_export.py::TestDraftExport::test_missing_meta_kernel_custom_op_multiple_profiles, test/export/test_draft_export.py::TestDraftExport::test_missing_meta_kernel_custom_op_update_profile, test/export/test_draft_export.py::TestDraftExport::test_missing_meta_kernel_guard, test/export/test_draft_export.py::TestDraftExport::test_missing_meta_kernel_impl, test/export/test_draft_export.py::TestDraftExport::test_offsets, test/export/test_draft_export.py::TestDraftExport::test_override_incorrectly_aliasing_kernel, test/export/test_draft_export.py::TestDraftExport::test_override_mismatched_fake_kernel_with_unbacked_symbols, test/export/test_draft_export.py::TestDraftExport::test_override_size_and_dtype_mismatched_fake_kernels, test/export/test_draft_export.py::TestDraftExport::test_shape_failure, test/export/test_draft_export.py::TestDraftExport::test_side_effect1, test/export/test_draft_export.py::TestDraftExport::test_side_effect_inps, test/export/test_draft_export.py::TestDraftExport::test_torchbind, test/export/test_draft_export.py::TestDraftExport::test_unbacked_div_mod_replacement 2025-09-07T06:59:57.9598613Z 2025-09-07T06:59:57.9598704Z Running export/test_export_strict 1/1 ... [2025-09-07 06:59:57.958382] 2025-09-07T06:59:57.9598895Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T06:59:57.9605314Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'export/test_export_strict.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 06:59:57.958609] 2025-09-07T07:02:08.9293814Z 2025-09-07T07:02:08.9295057Z export/test_export_strict 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_export_strict_1.1_314d130ca734f8a3_.log 2025-09-07T07:02:08.9365281Z Running 416 items in this shard: test/export/test_export_strict.py::StrictExportTestDynamismExpression::test_export_assume_static_by_default_strict, test/export/test_export_strict.py::StrictExportTestDynamismExpression::test_export_constraints_error_not_in_range_strict, test/export/test_export_strict.py::StrictExportTestDynamismExpression::test_export_constraints_error_strict, test/export/test_export_strict.py::StrictExportTestDynamismExpression::test_export_inline_constraints_strict, test/export/test_export_strict.py::StrictExportTestDynamismExpression::test_export_slice_maxsize_strict, test/export/test_export_strict.py::StrictExportTestDynamismExpression::test_export_slice_unbacked_dim1_strict, test/export/test_export_strict.py::StrictExportTestDynamismExpression::test_export_strict_narrow_unbacked_expr_strict, test/export/test_export_strict.py::StrictExportTestDynamismExpression::test_no_grad_param_inplace_strict, test/export/test_export_strict.py::StrictExportTestDynamismExpression::test_reshape_view_backed_size_oblivious_strict, test/export/test_export_strict.py::StrictExportTestExport::test__scaled_dot_product_flash_attention_strict, test/export/test_export_strict.py::StrictExportTestExport::test_additional_inputs_constants_strict, test/export/test_export_strict.py::StrictExportTestExport::test_allow_explicit_guards_as_runtime_asserts_strict, test/export/test_export_strict.py::StrictExportTestExport::test_args_type_checked_strict, test/export/test_export_strict.py::StrictExportTestExport::test_aten_lift_fresh_copy_strict, test/export/test_export_strict.py::StrictExportTestExport::test_attention_strict, test/export/test_export_strict.py::StrictExportTestExport::test_attr_assignment_extra_strict, test/export/test_export_strict.py::StrictExportTestExport::test_automatic_constrain_size_strict, test/export/test_export_strict.py::StrictExportTestExport::test_automatic_dynamic_shapes_constant_relation_strict, test/export/test_export_strict.py::StrictExportTestExport::test_automatic_dynamic_shapes_linear_relation_strict, test/export/test_export_strict.py::StrictExportTestExport::test_automatic_dynamic_shapes_simple_equality_strict, test/export/test_export_strict.py::StrictExportTestExport::test_baddbmm_strict, test/export/test_export_strict.py::StrictExportTestExport::test_basic_non_strict_fake_tensor_strict, test/export/test_export_strict.py::StrictExportTestExport::test_basic_non_strict_real_tensor_strict, test/export/test_export_strict.py::StrictExportTestExport::test_basic_strict, test/export/test_export_strict.py::StrictExportTestExport::test_bincount_strict, test/export/test_export_strict.py::StrictExportTestExport::test_buffer_util_strict, test/export/test_export_strict.py::StrictExportTestExport::test_capture_subclass_constructor_strict, test/export/test_export_strict.py::StrictExportTestExport::test_capture_subclass_constructor_torch_ir_strict, test/export/test_export_strict.py::StrictExportTestExport::test_capture_subclass_wrong_strict, test/export/test_export_strict.py::StrictExportTestExport::test_ccode_python_mod_strict, test/export/test_export_strict.py::StrictExportTestExport::test_check_specialized_int_strict, test/export/test_export_strict.py::StrictExportTestExport::test_checks_to_constrain_range_strict, test/export/test_export_strict.py::StrictExportTestExport::test_cleanup_dynamic_markers_strict, test/export/test_export_strict.py::StrictExportTestExport::test_colin_unbacked_backed_vr_sub_strict, test/export/test_export_strict.py::StrictExportTestExport::test_colon_parameter_strict, test/export/test_export_strict.py::StrictExportTestExport::test_compiling_state_strict, test/export/test_export_strict.py::StrictExportTestExport::test_cond_access_identical_symint_closure_strict, test/export/test_export_strict.py::StrictExportTestExport::test_cond_branches_return_constant_int_strict, test/export/test_export_strict.py::StrictExportTestExport::test_cond_branches_return_same_int_strict, test/export/test_export_strict.py::StrictExportTestExport::test_cond_buffers_strict, test/export/test_export_strict.py::StrictExportTestExport::test_cond_contains_unbacked_no_escape_strict, test/export/test_export_strict.py::StrictExportTestExport::test_cond_int_closure_strict, test/export/test_export_strict.py::StrictExportTestExport::test_cond_unflatten_strict, test/export/test_export_strict.py::StrictExportTestExport::test_cond_with_module_stack_export_with_strict, test/export/test_export_strict.py::StrictExportTestExport::test_cond_with_module_stack_export_with_unflatten_strict, test/export/test_export_strict.py::StrictExportTestExport::test_constant_aliasing_strict, test/export/test_export_strict.py::StrictExportTestExport::test_constant_input_naming_strict, test/export/test_export_strict.py::StrictExportTestExport::test_constant_no_user_inp_strict, test/export/test_export_strict.py::StrictExportTestExport::test_constant_output_dup_strict, test/export/test_export_strict.py::StrictExportTestExport::test_constant_output_strict, test/export/test_export_strict.py::StrictExportTestExport::test_constant_requires_grad_const_strict, test/export/test_export_strict.py::StrictExportTestExport::test_constant_return_strict, test/export/test_export_strict.py::StrictExportTestExport::test_constant_tensor_mutation_strict, test/export/test_export_strict.py::StrictExportTestExport::test_constant_tensor_with_non_functional_nested_strict, test/export/test_export_strict.py::StrictExportTestExport::test_constant_tensor_with_non_functional_strict, test/export/test_export_strict.py::StrictExportTestExport::test_constrain_decomp_strict, test/export/test_export_strict.py::StrictExportTestExport::test_constrain_size_in_eager_strict, test/export/test_export_strict.py::StrictExportTestExport::test_constrain_size_with_constrain_value_strict, test/export/test_export_strict.py::StrictExportTestExport::test_constrain_size_with_various_cases_strict, test/export/test_export_strict.py::StrictExportTestExport::test_conv_dynamic_strict, test/export/test_export_strict.py::StrictExportTestExport::test_crop_like_strict, test/export/test_export_strict.py::StrictExportTestExport::test_cse_for_symint_strict, test/export/test_export_strict.py::StrictExportTestExport::test_custom_op_auto_functionalize_pre_dispatch_strict, test/export/test_export_strict.py::StrictExportTestExport::test_custom_op_auto_functionalize_strict, test/export/test_export_strict.py::StrictExportTestExport::test_custom_op_auto_warn_pre_dispatch_strict, test/export/test_export_strict.py::StrictExportTestExport::test_custom_op_preserve_strict, test/export/test_export_strict.py::StrictExportTestExport::test_custom_pytree_strict, test/export/test_export_strict.py::StrictExportTestExport::test_custom_tag_metadata_re_export_strict, test/export/test_export_strict.py::StrictExportTestExport::test_decomp_batch_norm_functional_predispatch_strict, test/export/test_export_strict.py::StrictExportTestExport::test_decomp_item_in_prim_after_decomposition_strict, test/export/test_export_strict.py::StrictExportTestExport::test_decomp_item_in_prim_before_decomposition_strict, test/export/test_export_strict.py::StrictExportTestExport::test_default_decomposition_core_cia_ops_strict, test/export/test_export_strict.py::StrictExportTestExport::test_derived_dim_1_2_strict, test/export/test_export_strict.py::StrictExportTestExport::test_derived_dim_basic_strict, test/export/test_export_strict.py::StrictExportTestExport::test_derived_dim_integer_strict, test/export/test_export_strict.py::StrictExportTestExport::test_derived_dim_nested_strict, test/export/test_export_strict.py::StrictExportTestExport::test_derived_dim_out_of_order_repeat_derived_strict, test/export/test_export_strict.py::StrictExportTestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_strict, test/export/test_export_strict.py::StrictExportTestExport::test_derived_dim_out_of_order_simplified_strict, test/export/test_export_strict.py::StrictExportTestExport::test_derived_dim_out_of_order_strict, test/export/test_export_strict.py::StrictExportTestExport::test_derived_dim_repeat_derived_strict, test/export/test_export_strict.py::StrictExportTestExport::test_detect_leak_nonstrict_strict, test/export/test_export_strict.py::StrictExportTestExport::test_detect_leak_nonstrict_with_stacktrace_strict, test/export/test_export_strict.py::StrictExportTestExport::test_detect_leak_strict_strict, test/export/test_export_strict.py::StrictExportTestExport::test_device_to_dynamic_strict, test/export/test_export_strict.py::StrictExportTestExport::test_device_to_gpu_strict, test/export/test_export_strict.py::StrictExportTestExport::test_device_to_mutation_float_strict, test/export/test_export_strict.py::StrictExportTestExport::test_device_to_mutation_strict, test/export/test_export_strict.py::StrictExportTestExport::test_device_to_static_strict, test/export/test_export_strict.py::StrictExportTestExport::test_dim_1_2_strict, test/export/test_export_strict.py::StrictExportTestExport::test_dim_auto_and_dim_strict, test/export/test_export_strict.py::StrictExportTestExport::test_dim_dynamic_divisibility_strict, test/export/test_export_strict.py::StrictExportTestExport::test_dim_dynamic_specialization_strict, test/export/test_export_strict.py::StrictExportTestExport::test_dim_dynamic_strict, test/export/test_export_strict.py::StrictExportTestExport::test_dim_hint_range_violations_strict, test/export/test_export_strict.py::StrictExportTestExport::test_dim_hint_ranges_strict, test/export/test_export_strict.py::StrictExportTestExport::test_disable_forced_specializations_errors_strict, test/export/test_export_strict.py::StrictExportTestExport::test_disable_forced_specializations_ok_strict, test/export/test_export_strict.py::StrictExportTestExport::test_distributed_all_gather_into_tensor_strict, test/export/test_export_strict.py::StrictExportTestExport::test_distributed_all_gather_strict, test/export/test_export_strict.py::StrictExportTestExport::test_distributed_all_reduce_strict, test/export/test_export_strict.py::StrictExportTestExport::test_distributed_all_to_all_single_strict, test/export/test_export_strict.py::StrictExportTestExport::test_distributed_reduce_scatter_tensor_strict, test/export/test_export_strict.py::StrictExportTestExport::test_dont_duck_size_for_auto_dynamic_strict, test/export/test_export_strict.py::StrictExportTestExport::test_double_lifted_constants_strict, test/export/test_export_strict.py::StrictExportTestExport::test_draft_export_checks_aliasing_strict, test/export/test_export_strict.py::StrictExportTestExport::test_draft_export_checks_mutation_list_strict, test/export/test_export_strict.py::StrictExportTestExport::test_draft_export_checks_mutation_strict, test/export/test_export_strict.py::StrictExportTestExport::test_draft_export_checks_mutation_with_nan_strict, test/export/test_export_strict.py::StrictExportTestExport::test_draft_export_fake_kernel_inference_errors_strict, test/export/test_export_strict.py::StrictExportTestExport::test_draft_export_infers_fake_kernel_strict, test/export/test_export_strict.py::StrictExportTestExport::test_duplicate_modules_with_non_persistent_buffers_strict, test/export/test_export_strict.py::StrictExportTestExport::test_dynamic_lr_shift_strict, test/export/test_export_strict.py::StrictExportTestExport::test_dynamic_shapes_bounds_strict, test/export/test_export_strict.py::StrictExportTestExport::test_dynamic_shapes_builder_basic_strict, test/export/test_export_strict.py::StrictExportTestExport::test_dynamic_shapes_builder_kwargs_strict, test/export/test_export_strict.py::StrictExportTestExport::test_dynamic_shapes_builder_pytree_strict, test/export/test_export_strict.py::StrictExportTestExport::test_dynamic_shapes_dataclass_strict, test/export/test_export_strict.py::StrictExportTestExport::test_dynamic_shapes_inferred_basic_strict, test/export/test_export_strict.py::StrictExportTestExport::test_dynamic_shapes_serdes_generic_strict, test/export/test_export_strict.py::StrictExportTestExport::test_dynamic_shapes_serdes_user_errors_strict, test/export/test_export_strict.py::StrictExportTestExport::test_dynamic_shapes_serdes_various_strict, test/export/test_export_strict.py::StrictExportTestExport::test_dynamic_shapes_spec_with_pytree_strict, test/export/test_export_strict.py::StrictExportTestExport::test_dynamic_sym_round_strict, test/export/test_export_strict.py::StrictExportTestExport::test_ends_of_bounds_oblivious_strict, test/export/test_export_strict.py::StrictExportTestExport::test_error_does_not_reference_eager_fallback_strict, test/export/test_export_strict.py::StrictExportTestExport::test_error_when_passing_mutating_primitive_op_strict, test/export/test_export_strict.py::StrictExportTestExport::test_exception_strict, test/export/test_export_strict.py::StrictExportTestExport::test_expand_copy_export_handles_implicit_true_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_api_with_dynamic_shapes_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_as_backend_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_associative_scan_lifted_buffers_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_associative_scan_symbol_dim_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_associative_scan_symbol_scandim_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_aten_to_unflatten_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_aten_to_unflatten_subclass_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_cond_preserve_torch_fn_for_subgraphs_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_cond_symbool_pred_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_cond_warns_constant_pred_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_custom_decomp_table_basic_pop_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_custom_decomp_table_container_methods_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_custom_op_lib_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_custom_triton_kernel_mutable_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_custom_triton_kernel_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_cyclic_reference_leak_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_decomp_torture_case_1_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_decomp_torture_case_2_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_decomps_dynamic_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_decomps_simple_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_dynamo_config_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_for_training_run_decomp_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_for_training_with_container_type_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_for_training_with_dynamic_shapes_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_for_training_with_mutation_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_for_training_with_state_dict_hooks_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_func_with_default_kwargs_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_func_with_keyword_only_args_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_func_with_kwargs_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_func_with_pytree_kwargs_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_func_with_var_keyword_args_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_func_with_var_keyword_pytree_args_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_func_with_var_postional_args_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_function_schema_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_graph_with_no_inputs_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_input_mutation_bug_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_input_mutation_dynamic_shape_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_input_mutation_static_shape_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_linear_preserve_dynamic_shape_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_max_nonstrict_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_max_onnx_reported_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_method_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_mod_constraints_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_module_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_preserve_linear_at_aot_level_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_preserve_linear_but_not_custom_op_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_scan_pytree_output_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_script_module_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_statically_known_true_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_then_compile_tensor_ctor_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_with_autocast_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_with_fake_tensor_inputs_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_with_inline_constraints_complex_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_with_inline_constraints_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_with_set_grad_enabled_strict, test/export/test_export_strict.py::StrictExportTestExport::test_export_with_wrong_inputs_strict, test/export/test_export_strict.py::StrictExportTestExport::test_external_call_non_strict_real_tensor_strict, test/export/test_export_strict.py::StrictExportTestExport::test_fake_inputs_strict, test/export/test_export_strict.py::StrictExportTestExport::test_fake_weights_strict, test/export/test_export_strict.py::StrictExportTestExport::test_filter_traceback_frames_strict, test/export/test_export_strict.py::StrictExportTestExport::test_float_conversion_from_int_strict, test/export/test_export_strict.py::StrictExportTestExport::test_float_conversion_strict, test/export/test_export_strict.py::StrictExportTestExport::test_fqn_strict, test/export/test_export_strict.py::StrictExportTestExport::test_from_node_metadata_export_strict, test/export/test_export_strict.py::StrictExportTestExport::test_full_on_scalar_tensor_strict, test/export/test_export_strict.py::StrictExportTestExport::test_function_holding_tensor_strict, test/export/test_export_strict.py::StrictExportTestExport::test_hints_wrapper_strict, test/export/test_export_strict.py::StrictExportTestExport::test_hoo_inline_users_issue_strict, test/export/test_export_strict.py::StrictExportTestExport::test_if_functional_strict, test/export/test_export_strict.py::StrictExportTestExport::test_if_post_autograd_op_preserved_strict, test/export/test_export_strict.py::StrictExportTestExport::test_inline_script_class_method_recursive_strict, test/export/test_export_strict.py::StrictExportTestExport::test_inline_script_class_method_strict, test/export/test_export_strict.py::StrictExportTestExport::test_inline_script_function_strict, test/export/test_export_strict.py::StrictExportTestExport::test_inline_script_method_strict, test/export/test_export_strict.py::StrictExportTestExport::test_int_shape_specialization_strict, test/export/test_export_strict.py::StrictExportTestExport::test_intermediate_shape_comp_strict, test/export/test_export_strict.py::StrictExportTestExport::test_is_exporting_strict, test/export/test_export_strict.py::StrictExportTestExport::test_is_non_negative_check_function_strict, test/export/test_export_strict.py::StrictExportTestExport::test_is_nonzero_strict, test/export/test_export_strict.py::StrictExportTestExport::test_isnonzero_strict, test/export/test_export_strict.py::StrictExportTestExport::test_issue_113041_strict, test/export/test_export_strict.py::StrictExportTestExport::test_issue_157289_strict, test/export/test_export_strict.py::StrictExportTestExport::test_istft_op_strict, test/export/test_export_strict.py::StrictExportTestExport::test_keep_composite_ops_invalid_strict, test/export/test_export_strict.py::StrictExportTestExport::test_keep_composite_ops_linear_convd_for_training_ir_strict, test/export/test_export_strict.py::StrictExportTestExport::test_keep_composite_ops_linear_convd_strict, test/export/test_export_strict.py::StrictExportTestExport::test_kwarg_dynamic_shapes_diff_order_strict, test/export/test_export_strict.py::StrictExportTestExport::test_kwargs_reorder_strict, test/export/test_export_strict.py::StrictExportTestExport::test_layer_norm_unbacked_normalized_shape_strict, test/export/test_export_strict.py::StrictExportTestExport::test_layer_sharing_strict, test/export/test_export_strict.py::StrictExportTestExport::test_lazy_module_kwargs_strict, test/export/test_export_strict.py::StrictExportTestExport::test_lifted_constants_strict, test/export/test_export_strict.py::StrictExportTestExport::test_linear_conv_strict, test/export/test_export_strict.py::StrictExportTestExport::test_malformed_fqn_from_source_name_strict, test/export/test_export_strict.py::StrictExportTestExport::test_map_buffers_strict, test/export/test_export_strict.py::StrictExportTestExport::test_map_strict, test/export/test_export_strict.py::StrictExportTestExport::test_mask_nonzero_static_strict, test/export/test_export_strict.py::StrictExportTestExport::test_masked_select_dynamic_strict, test/export/test_export_strict.py::StrictExportTestExport::test_math_pow_strict, test/export/test_export_strict.py::StrictExportTestExport::test_mismatched_dynamic_shapes_strict, test/export/test_export_strict.py::StrictExportTestExport::test_mixed_input_strict, test/export/test_export_strict.py::StrictExportTestExport::test_module_dict_key_strict, test/export/test_export_strict.py::StrictExportTestExport::test_module_input_strict, test/export/test_export_strict.py::StrictExportTestExport::test_module_input_subclasses_parameterization_nested_strict, test/export/test_export_strict.py::StrictExportTestExport::test_module_list_slice_strict, test/export/test_export_strict.py::StrictExportTestExport::test_module_strict, test/export/test_export_strict.py::StrictExportTestExport::test_module_with_dict_container_inp_out_strict, test/export/test_export_strict.py::StrictExportTestExport::test_modules_access_for_deleted_submodule_strict, test/export/test_export_strict.py::StrictExportTestExport::test_more_multidimensional_slicing_strict, test/export/test_export_strict.py::StrictExportTestExport::test_multidimensional_slicing_strict, test/export/test_export_strict.py::StrictExportTestExport::test_multinomial_dynamic_strict, test/export/test_export_strict.py::StrictExportTestExport::test_multiple_definitions_same_name_dim_strict, test/export/test_export_strict.py::StrictExportTestExport::test_nested_dynamic_shapes_spec_strict, test/export/test_export_strict.py::StrictExportTestExport::test_nested_module_strict, test/export/test_export_strict.py::StrictExportTestExport::test_nested_module_with_constant_buffer_strict, test/export/test_export_strict.py::StrictExportTestExport::test_nested_module_with_init_buffer_strict, test/export/test_export_strict.py::StrictExportTestExport::test_nested_module_with_parameter_strict, test/export/test_export_strict.py::StrictExportTestExport::test_nn_module_stack_shared_submodule_strict, test/export/test_export_strict.py::StrictExportTestExport::test_nn_module_stack_strict, test/export/test_export_strict.py::StrictExportTestExport::test_no_check_is_size_error_strict, test/export/test_export_strict.py::StrictExportTestExport::test_no_suggested_fixes_for_data_dependent_errors_strict, test/export/test_export_strict.py::StrictExportTestExport::test_no_tensor_computation_2_strict, test/export/test_export_strict.py::StrictExportTestExport::test_no_tensor_computation_3_strict, test/export/test_export_strict.py::StrictExportTestExport::test_no_tensor_computation_4_strict, test/export/test_export_strict.py::StrictExportTestExport::test_no_tensor_computation_strict, test/export/test_export_strict.py::StrictExportTestExport::test_non_arg_name_dynamic_shapes_api_strict, test/export/test_export_strict.py::StrictExportTestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_strict, test/export/test_export_strict.py::StrictExportTestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_strict, test/export/test_export_strict.py::StrictExportTestExport::test_non_persistent_buffer_strict, test/export/test_export_strict.py::StrictExportTestExport::test_non_strict_dynamic_shapes_strict, test/export/test_export_strict.py::StrictExportTestExport::test_non_strict_dynamic_shapes_suggested_fixes_strict, test/export/test_export_strict.py::StrictExportTestExport::test_none_buffers_strict, test/export/test_export_strict.py::StrictExportTestExport::test_nonstrict_retrace_preserves_metadata_strict, test/export/test_export_strict.py::StrictExportTestExport::test_nonzero_2_strict, test/export/test_export_strict.py::StrictExportTestExport::test_nonzero_dynamic_strict, test/export/test_export_strict.py::StrictExportTestExport::test_not_registered_parameter_strict, test/export/test_export_strict.py::StrictExportTestExport::test_operator_aten_tensor_mode_variant_strict, test/export/test_export_strict.py::StrictExportTestExport::test_output_node_name_strict, test/export/test_export_strict.py::StrictExportTestExport::test_pad_sequence_strict, test/export/test_export_strict.py::StrictExportTestExport::test_param_util_strict, test/export/test_export_strict.py::StrictExportTestExport::test_partial_patched_forward_strict, test/export/test_export_strict.py::StrictExportTestExport::test_placeholder_naming_collisions_hoo_subgraphs_strict, test/export/test_export_strict.py::StrictExportTestExport::test_placeholder_naming_collisions_strict, test/export/test_export_strict.py::StrictExportTestExport::test_placeholder_naming_order_strict, test/export/test_export_strict.py::StrictExportTestExport::test_placeholder_naming_order_variadic_strict, test/export/test_export_strict.py::StrictExportTestExport::test_placeholder_update_preserving_strict, test/export/test_export_strict.py::StrictExportTestExport::test_predispatch_cond_strict, test/export/test_export_strict.py::StrictExportTestExport::test_predispatch_grad_wrappers_strict, test/export/test_export_strict.py::StrictExportTestExport::test_preserve_module_call_signature_unflatten_specialization_strict, test/export/test_export_strict.py::StrictExportTestExport::test_preserve_requires_grad_placeholders_strict, test/export/test_export_strict.py::StrictExportTestExport::test_preserve_shape_dynamism_for_unused_inputs_strict, test/export/test_export_strict.py::StrictExportTestExport::test_profiling_code_strict, test/export/test_export_strict.py::StrictExportTestExport::test_python_asserts_with_sym_int_strict, test/export/test_export_strict.py::StrictExportTestExport::test_pytree_register_data_class_strict, test/export/test_export_strict.py::StrictExportTestExport::test_pytree_register_nested_data_class_strict, test/export/test_export_strict.py::StrictExportTestExport::test_raise_user_error_when_guard_on_data_dependent_operation_strict, test/export/test_export_strict.py::StrictExportTestExport::test_range_constraints_with_replacement_strict, test/export/test_export_strict.py::StrictExportTestExport::test_real_tensor_alias_dtype_mismatch_strict, test/export/test_export_strict.py::StrictExportTestExport::test_real_tensor_bool_cast_strict, test/export/test_export_strict.py::StrictExportTestExport::test_real_tensor_errors_on_aliasing_custom_op_strict, test/export/test_export_strict.py::StrictExportTestExport::test_real_tensor_for_max_op_strict, test/export/test_export_strict.py::StrictExportTestExport::test_real_tensor_size_mismatch_strict, test/export/test_export_strict.py::StrictExportTestExport::test_redundant_assert_max_upper_bound_strict, test/export/test_export_strict.py::StrictExportTestExport::test_redundant_asserts_strict, test/export/test_export_strict.py::StrictExportTestExport::test_refine_dynamic_shapes_from_suggested_fixes_strict, test/export/test_export_strict.py::StrictExportTestExport::test_register_constant_strict, test/export/test_export_strict.py::StrictExportTestExport::test_repeat_interleave_strict, test/export/test_export_strict.py::StrictExportTestExport::test_replace_unbacked_with_very_large_upperbound_strict, test/export/test_export_strict.py::StrictExportTestExport::test_replaced_unbacked_bindings_strict, test/export/test_export_strict.py::StrictExportTestExport::test_reshape_view_helper_strict, test/export/test_export_strict.py::StrictExportTestExport::test_retracable_ep_strict, test/export/test_export_strict.py::StrictExportTestExport::test_retrace_pre_autograd_strict, test/export/test_export_strict.py::StrictExportTestExport::test_run_decomposition_supports_user_input_mutation_strict, test/export/test_export_strict.py::StrictExportTestExport::test_run_decompositions_keep_metadata_strict, test/export/test_export_strict.py::StrictExportTestExport::test_run_decompositions_keep_tensor_constant_metadata_strict, test/export/test_export_strict.py::StrictExportTestExport::test_runtime_assert_for_prim_strict, test/export/test_export_strict.py::StrictExportTestExport::test_runtime_assert_for_prm_str_strict, test/export/test_export_strict.py::StrictExportTestExport::test_runtime_assert_with_size_strict, test/export/test_export_strict.py::StrictExportTestExport::test_sdpa_gqa_strict, test/export/test_export_strict.py::StrictExportTestExport::test_sequential_slicing_strict, test/export/test_export_strict.py::StrictExportTestExport::test_set_example_inputs_strict, test/export/test_export_strict.py::StrictExportTestExport::test_set_grad_as_side_effect_strict, test/export/test_export_strict.py::StrictExportTestExport::test_set_grad_empty_strict, test/export/test_export_strict.py::StrictExportTestExport::test_set_grad_unflatten_strict, test/export/test_export_strict.py::StrictExportTestExport::test_setgrad_lifted_tensor_strict, test/export/test_export_strict.py::StrictExportTestExport::test_shared_submodule_nn_module_stack_strict, test/export/test_export_strict.py::StrictExportTestExport::test_simple_export_for_training_strict, test/export/test_export_strict.py::StrictExportTestExport::test_simple_unbacked_view_strict, test/export/test_export_strict.py::StrictExportTestExport::test_size_input_strict, test/export/test_export_strict.py::StrictExportTestExport::test_slice_nn_module_stack_strict, test/export/test_export_strict.py::StrictExportTestExport::test_solver_unsupported_sympy_function_strict, test/export/test_export_strict.py::StrictExportTestExport::test_specialize_derived_dim_roots_strict, test/export/test_export_strict.py::StrictExportTestExport::test_split_const_gm_with_lifted_constants_strict, test/export/test_export_strict.py::StrictExportTestExport::test_stack_trace_make_fx_strict, test/export/test_export_strict.py::StrictExportTestExport::test_stack_trace_strict, test/export/test_export_strict.py::StrictExportTestExport::test_state_primitives_strict, test/export/test_export_strict.py::StrictExportTestExport::test_state_shape_attribute_assignment_strict, test/export/test_export_strict.py::StrictExportTestExport::test_state_tensors_strict, test/export/test_export_strict.py::StrictExportTestExport::test_static_dim_constraints_strict, test/export/test_export_strict.py::StrictExportTestExport::test_subclass_nested_attr_access_complicated_metadata_strict, test/export/test_export_strict.py::StrictExportTestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_strict, test/export/test_export_strict.py::StrictExportTestExport::test_subclass_nested_attr_access_const_metadata_strict, test/export/test_export_strict.py::StrictExportTestExport::test_subclass_nested_attr_access_strict, test/export/test_export_strict.py::StrictExportTestExport::test_subclass_nested_attr_access_submodule_strict, test/export/test_export_strict.py::StrictExportTestExport::test_subclasses_parameterization_nested_strict, test/export/test_export_strict.py::StrictExportTestExport::test_subclasses_parameterization_strict, test/export/test_export_strict.py::StrictExportTestExport::test_suggest_torch_checks_with_non_negative_check_strict, test/export/test_export_strict.py::StrictExportTestExport::test_suggest_torch_checks_with_regular_check_strict, test/export/test_export_strict.py::StrictExportTestExport::test_suggested_fixes_for_data_dependent_errors_basic_strict, test/export/test_export_strict.py::StrictExportTestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_strict, test/export/test_export_strict.py::StrictExportTestExport::test_suggested_fixes_new_roots_strict, test/export/test_export_strict.py::StrictExportTestExport::test_sym_float_operators_strict, test/export/test_export_strict.py::StrictExportTestExport::test_sym_or_sym_and_strict, test/export/test_export_strict.py::StrictExportTestExport::test_sym_sqrt_strict, test/export/test_export_strict.py::StrictExportTestExport::test_symbool_item_strict, test/export/test_export_strict.py::StrictExportTestExport::test_symfloat_item_strict, test/export/test_export_strict.py::StrictExportTestExport::test_symint_input_additional_inputs_strict, test/export/test_export_strict.py::StrictExportTestExport::test_symint_input_basic_strict, test/export/test_export_strict.py::StrictExportTestExport::test_symint_input_ranges_strict, test/export/test_export_strict.py::StrictExportTestExport::test_symint_input_shapes_collection_strict, test/export/test_export_strict.py::StrictExportTestExport::test_symint_input_specialization_strict, test/export/test_export_strict.py::StrictExportTestExport::test_symint_item_strict, test/export/test_export_strict.py::StrictExportTestExport::test_symint_output_strict, test/export/test_export_strict.py::StrictExportTestExport::test_symint_tensor_return_strict, test/export/test_export_strict.py::StrictExportTestExport::test_tensor_attribute_zero_args_strict, test/export/test_export_strict.py::StrictExportTestExport::test_tensor_constant_aten_to_strict, test/export/test_export_strict.py::StrictExportTestExport::test_tensor_constant_with_wrapped_method_strict, test/export/test_export_strict.py::StrictExportTestExport::test_to_module_with_mutated_buffer_multiple_strict, test/export/test_export_strict.py::StrictExportTestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_strict, test/export/test_export_strict.py::StrictExportTestExport::test_to_module_with_mutated_buffer_strict, test/export/test_export_strict.py::StrictExportTestExport::test_tolist_strict, test/export/test_export_strict.py::StrictExportTestExport::test_torch_check_eq_commutativity_strict, test/export/test_export_strict.py::StrictExportTestExport::test_torch_fn_strict, test/export/test_export_strict.py::StrictExportTestExport::test_trace_under_fake_strict, test/export/test_export_strict.py::StrictExportTestExport::test_train_eval_on_exported_preautograd_module_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unbacked_3d_matmul_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unbacked_bincount_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unbacked_bindings_for_divisible_u_symint_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unbacked_deferred_runtime_retrace_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unbacked_expand_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unbacked_infer_size_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unbacked_kth_value_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unbacked_linear_layer_norm_input_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unbacked_noncontig_lin_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unbacked_pad_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unbacked_scalar_constructor_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unbacked_slice_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unbacked_to_cond_passthrough_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unbacked_to_cond_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unbacked_unsqueeze_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_asserts_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_buffer_update_child2parent_swap_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_closure_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_isinstance_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_multiple_graphs_dispatch_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_multiple_graphs_shared_submodule_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_multiple_graphs_state_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_no_unroll_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_placeholder_update_child2parent_swap_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_random_dag_5_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_random_dag_6_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_random_dag_buf_8_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_random_dag_const_preserving_3_1_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_random_dag_const_preserving_3_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_random_dag_mutating_buf_4_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_random_dag_mutating_buf_6_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_random_dag_mutating_buf_9_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_10_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_5_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_7_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unflatten_random_dag_preserving_4_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unused_aliases_strict, test/export/test_export_strict.py::StrictExportTestExport::test_unused_constant_strict, test/export/test_export_strict.py::StrictExportTestExport::test_use_embedding_twice_strict, test/export/test_export_strict.py::StrictExportTestExport::test_user_input_and_buffer_mutation_strict, test/export/test_export_strict.py::StrictExportTestExport::test_vmap_strict, test/export/test_export_strict.py::StrictExportTestExport::test_while_loop_assert_separation_strict, test/export/test_export_strict.py::StrictExportTestExport::test_while_loop_index_assertions_strict, test/export/test_export_strict.py::StrictExportTestExport::test_while_loop_simple_strict, test/export/test_export_strict.py::StrictExportTestExport::test_while_loop_tensor_constant_idx_strict, test/export/test_export_strict.py::StrictExportTestExport::test_wrapper_module_strict 2025-09-07T07:02:08.9421606Z 2025-09-07T07:02:08.9421705Z Running export/test_schema 1/1 ... [2025-09-07 07:02:08.929897] 2025-09-07T07:02:08.9421909Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:02:08.9422350Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'export/test_schema.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:02:08.930097] 2025-09-07T07:02:11.0490942Z 2025-09-07T07:02:11.0492322Z export/test_schema 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_schema_1.1_5bc5c2be1cf9a94f_.log 2025-09-07T07:02:11.0494388Z Running 5 items in this shard: test/export/test_schema.py::TestSchema::test_schema_check, test/export/test_schema.py::TestSchema::test_schema_comparison, test/export/test_schema.py::TestSchema::test_schema_compatibility, test/export/test_schema.py::TestSchema::test_schema_diff, test/export/test_schema.py::TestSchema::test_thrift_schema_unchanged 2025-09-07T07:02:11.0496021Z 2025-09-07T07:02:11.0496260Z Running export/test_serdes 1/1 ... [2025-09-07 07:02:11.049086] 2025-09-07T07:02:11.0496973Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:02:11.0502677Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'export/test_serdes.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:02:11.049369] 2025-09-07T07:03:51.6858220Z 2025-09-07T07:03:51.6860163Z export/test_serdes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_serdes_1.1_3ff4900b8dd44a5c_.log 2025-09-07T07:03:51.6996595Z Running 832 items in this shard: test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_assume_static_by_default_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_constraints_error_not_in_range_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_constraints_error_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_inline_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_slice_maxsize_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_slice_unbacked_dim1_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_strict_narrow_unbacked_expr_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_no_grad_param_inplace_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_reshape_view_backed_size_oblivious_serdes_strict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_assume_static_by_default_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_constraints_error_not_in_range_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_constraints_error_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_inline_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_slice_maxsize_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_slice_unbacked_dim1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_strict_narrow_unbacked_expr_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_no_grad_param_inplace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_reshape_view_backed_size_oblivious_serdes_nonstrict, test/export/test_serdes.py::SerDesExportTestExport::test__scaled_dot_product_flash_attention_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_additional_inputs_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_allow_explicit_guards_as_runtime_asserts_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_args_type_checked_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_aten_lift_fresh_copy_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_attention_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_attr_assignment_extra_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_constrain_size_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_dynamic_shapes_constant_relation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_dynamic_shapes_linear_relation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_dynamic_shapes_simple_equality_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_baddbmm_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_basic_non_strict_fake_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_basic_non_strict_real_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_bincount_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_buffer_util_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_capture_subclass_constructor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_capture_subclass_constructor_torch_ir_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_capture_subclass_wrong_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_ccode_python_mod_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_check_specialized_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_checks_to_constrain_range_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cleanup_dynamic_markers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_colin_unbacked_backed_vr_sub_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_colon_parameter_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_compiling_state_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_access_identical_symint_closure_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_branches_return_constant_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_branches_return_same_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_contains_unbacked_no_escape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_int_closure_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_with_module_stack_export_with_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_with_module_stack_export_with_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_aliasing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_input_naming_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_no_user_inp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_output_dup_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_output_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_requires_grad_const_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_return_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_tensor_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_tensor_with_non_functional_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_tensor_with_non_functional_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_decomp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_size_in_eager_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_size_with_constrain_value_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_size_with_various_cases_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_conv_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_crop_like_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cse_for_symint_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_auto_functionalize_pre_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_auto_functionalize_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_auto_warn_pre_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_preserve_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_pytree_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_tag_metadata_re_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_decomp_batch_norm_functional_predispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_decomp_item_in_prim_after_decomposition_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_decomp_item_in_prim_before_decomposition_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_default_decomposition_core_cia_ops_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_1_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_integer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_repeat_derived_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_simplified_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_repeat_derived_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_detect_leak_nonstrict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_detect_leak_nonstrict_with_stacktrace_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_detect_leak_strict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_gpu_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_mutation_float_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_static_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_1_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_auto_and_dim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_dynamic_divisibility_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_dynamic_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_hint_range_violations_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_hint_ranges_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_disable_forced_specializations_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_disable_forced_specializations_ok_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_gather_into_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_gather_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_reduce_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_to_all_single_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_reduce_scatter_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dont_duck_size_for_auto_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_double_lifted_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_aliasing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_mutation_list_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_mutation_with_nan_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_fake_kernel_inference_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_infers_fake_kernel_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_duplicate_modules_with_non_persistent_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_lr_shift_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_bounds_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_builder_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_builder_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_builder_pytree_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_dataclass_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_inferred_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_serdes_generic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_serdes_user_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_serdes_various_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_spec_with_pytree_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_sym_round_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_ends_of_bounds_oblivious_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_error_does_not_reference_eager_fallback_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_error_when_passing_mutating_primitive_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_exception_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_expand_copy_export_handles_implicit_true_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_api_with_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_as_backend_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_associative_scan_lifted_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_associative_scan_symbol_dim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_associative_scan_symbol_scandim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_aten_to_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_aten_to_unflatten_subclass_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cond_preserve_torch_fn_for_subgraphs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cond_symbool_pred_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cond_warns_constant_pred_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_decomp_table_basic_pop_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_decomp_table_container_methods_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_op_lib_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_triton_kernel_mutable_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_triton_kernel_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cyclic_reference_leak_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomp_torture_case_1_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomp_torture_case_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomps_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomps_simple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_dynamo_config_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_run_decomp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_container_type_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_state_dict_hooks_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_default_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_keyword_only_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_pytree_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_var_keyword_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_var_keyword_pytree_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_var_postional_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_function_schema_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_graph_with_no_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_input_mutation_bug_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_input_mutation_dynamic_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_input_mutation_static_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_linear_preserve_dynamic_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_max_nonstrict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_max_onnx_reported_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_mod_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_preserve_linear_at_aot_level_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_preserve_linear_but_not_custom_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_scan_pytree_output_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_script_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_statically_known_true_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_then_compile_tensor_ctor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_autocast_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_fake_tensor_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_inline_constraints_complex_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_inline_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_set_grad_enabled_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_wrong_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_external_call_non_strict_real_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_fake_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_fake_weights_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_filter_traceback_frames_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_float_conversion_from_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_float_conversion_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_fqn_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_from_node_metadata_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_full_on_scalar_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_function_holding_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_hints_wrapper_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_hoo_inline_users_issue_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_if_functional_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_if_post_autograd_op_preserved_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_class_method_recursive_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_class_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_function_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_int_shape_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_intermediate_shape_comp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_is_exporting_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_is_non_negative_check_function_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_is_nonzero_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_isnonzero_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_issue_113041_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_issue_157289_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_istft_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_keep_composite_ops_invalid_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_keep_composite_ops_linear_convd_for_training_ir_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_keep_composite_ops_linear_convd_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_kwarg_dynamic_shapes_diff_order_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_kwargs_reorder_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_layer_norm_unbacked_normalized_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_layer_sharing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_lazy_module_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_lifted_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_linear_conv_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_malformed_fqn_from_source_name_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_map_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_map_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_mask_nonzero_static_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_masked_select_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_math_pow_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_mismatched_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_mixed_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_dict_key_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_input_subclasses_parameterization_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_list_slice_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_with_dict_container_inp_out_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_modules_access_for_deleted_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_more_multidimensional_slicing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_multidimensional_slicing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_multinomial_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_multiple_definitions_same_name_dim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_dynamic_shapes_spec_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_with_constant_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_with_init_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_with_parameter_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nn_module_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nn_module_stack_shared_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_check_is_size_error_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_suggested_fixes_for_data_dependent_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_3_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_arg_name_dynamic_shapes_api_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_persistent_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_strict_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_strict_dynamic_shapes_suggested_fixes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_none_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nonstrict_retrace_preserves_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nonzero_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nonzero_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_not_registered_parameter_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_operator_aten_tensor_mode_variant_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_output_node_name_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_pad_sequence_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_param_util_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_partial_patched_forward_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_collisions_hoo_subgraphs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_collisions_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_order_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_order_variadic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_update_preserving_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_predispatch_cond_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_predispatch_grad_wrappers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_module_call_signature_unflatten_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_requires_grad_placeholders_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_shape_dynamism_for_unused_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_profiling_code_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_python_asserts_with_sym_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_pytree_register_data_class_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_pytree_register_nested_data_class_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_raise_user_error_when_guard_on_data_dependent_operation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_range_constraints_with_replacement_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_alias_dtype_mismatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_bool_cast_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_errors_on_aliasing_custom_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_for_max_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_size_mismatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_redundant_assert_max_upper_bound_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_redundant_asserts_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_refine_dynamic_shapes_from_suggested_fixes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_register_constant_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_repeat_interleave_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_replace_unbacked_with_very_large_upperbound_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_replaced_unbacked_bindings_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_reshape_view_helper_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_retracable_ep_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_retrace_pre_autograd_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_run_decomposition_supports_user_input_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_run_decompositions_keep_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_run_decompositions_keep_tensor_constant_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_runtime_assert_for_prim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_runtime_assert_for_prm_str_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_runtime_assert_with_size_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sdpa_gqa_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sequential_slicing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_example_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_grad_as_side_effect_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_grad_empty_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_grad_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_setgrad_lifted_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_shared_submodule_nn_module_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_simple_export_for_training_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_simple_unbacked_view_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_size_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_slice_nn_module_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_solver_unsupported_sympy_function_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_specialize_derived_dim_roots_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_split_const_gm_with_lifted_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_stack_trace_make_fx_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_stack_trace_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_state_primitives_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_state_shape_attribute_assignment_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_state_tensors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_static_dim_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_complicated_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_const_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclasses_parameterization_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclasses_parameterization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggest_torch_checks_with_non_negative_check_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggest_torch_checks_with_regular_check_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggested_fixes_for_data_dependent_errors_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggested_fixes_new_roots_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sym_float_operators_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sym_or_sym_and_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sym_sqrt_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symbool_item_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symfloat_item_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_additional_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_ranges_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_shapes_collection_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_item_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_output_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_tensor_return_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tensor_attribute_zero_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tensor_constant_aten_to_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tensor_constant_with_wrapped_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_to_module_with_mutated_buffer_multiple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_to_module_with_mutated_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tolist_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_torch_check_eq_commutativity_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_torch_fn_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_trace_under_fake_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_train_eval_on_exported_preautograd_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_3d_matmul_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_bincount_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_bindings_for_divisible_u_symint_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_deferred_runtime_retrace_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_expand_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_infer_size_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_kth_value_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_linear_layer_norm_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_noncontig_lin_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_pad_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_scalar_constructor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_slice_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_to_cond_passthrough_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_to_cond_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_unsqueeze_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_asserts_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_buffer_update_child2parent_swap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_closure_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_isinstance_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_shared_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_state_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_no_unroll_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_placeholder_update_child2parent_swap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_5_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_6_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_buf_8_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_const_preserving_3_1_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_const_preserving_3_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_6_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_9_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_10_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_5_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_7_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_preserving_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unused_aliases_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unused_constant_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_use_embedding_twice_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_user_input_and_buffer_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_vmap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_assert_separation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_index_assertions_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_simple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_tensor_constant_idx_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_wrapper_module_serdes_strict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test__scaled_dot_product_flash_attention_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_additional_inputs_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_allow_explicit_guards_as_runtime_asserts_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_args_type_checked_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_aten_lift_fresh_copy_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_attention_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_attr_assignment_extra_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_constrain_size_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_dynamic_shapes_constant_relation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_dynamic_shapes_linear_relation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_dynamic_shapes_simple_equality_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_baddbmm_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_basic_non_strict_fake_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_basic_non_strict_real_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_bincount_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_buffer_util_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_capture_subclass_constructor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_capture_subclass_constructor_torch_ir_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_capture_subclass_wrong_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_ccode_python_mod_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_check_specialized_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_checks_to_constrain_range_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cleanup_dynamic_markers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_colin_unbacked_backed_vr_sub_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_colon_parameter_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_compiling_state_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_access_identical_symint_closure_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_branches_return_constant_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_branches_return_same_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_contains_unbacked_no_escape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_int_closure_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_with_module_stack_export_with_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_with_module_stack_export_with_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_aliasing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_input_naming_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_no_user_inp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_output_dup_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_output_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_requires_grad_const_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_return_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_tensor_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_tensor_with_non_functional_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_tensor_with_non_functional_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_decomp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_size_in_eager_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_size_with_constrain_value_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_size_with_various_cases_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_conv_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_crop_like_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cse_for_symint_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_auto_functionalize_pre_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_auto_functionalize_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_auto_warn_pre_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_preserve_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_pytree_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_tag_metadata_re_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_decomp_batch_norm_functional_predispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_decomp_item_in_prim_after_decomposition_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_decomp_item_in_prim_before_decomposition_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_default_decomposition_core_cia_ops_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_1_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_integer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_repeat_derived_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_simplified_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_repeat_derived_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_detect_leak_nonstrict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_detect_leak_nonstrict_with_stacktrace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_detect_leak_strict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_gpu_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_mutation_float_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_static_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_1_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_auto_and_dim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_dynamic_divisibility_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_dynamic_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_hint_range_violations_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_hint_ranges_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_disable_forced_specializations_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_disable_forced_specializations_ok_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_gather_into_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_gather_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_reduce_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_to_all_single_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_reduce_scatter_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dont_duck_size_for_auto_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_double_lifted_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_aliasing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_mutation_list_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_mutation_with_nan_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_fake_kernel_inference_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_infers_fake_kernel_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_duplicate_modules_with_non_persistent_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_lr_shift_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_bounds_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_builder_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_builder_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_builder_pytree_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_dataclass_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_inferred_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_serdes_generic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_serdes_user_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_serdes_various_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_spec_with_pytree_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_sym_round_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_ends_of_bounds_oblivious_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_error_does_not_reference_eager_fallback_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_error_when_passing_mutating_primitive_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_exception_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_expand_copy_export_handles_implicit_true_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_api_with_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_as_backend_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_associative_scan_lifted_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_associative_scan_symbol_dim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_associative_scan_symbol_scandim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_aten_to_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_aten_to_unflatten_subclass_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cond_preserve_torch_fn_for_subgraphs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cond_symbool_pred_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cond_warns_constant_pred_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_decomp_table_basic_pop_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_decomp_table_container_methods_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_op_lib_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_triton_kernel_mutable_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_triton_kernel_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cyclic_reference_leak_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomp_torture_case_1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomp_torture_case_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomps_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomps_simple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_dynamo_config_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_run_decomp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_container_type_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_state_dict_hooks_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_default_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_keyword_only_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_pytree_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_var_keyword_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_var_keyword_pytree_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_var_postional_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_function_schema_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_graph_with_no_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_input_mutation_bug_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_input_mutation_dynamic_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_input_mutation_static_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_linear_preserve_dynamic_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_max_nonstrict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_max_onnx_reported_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_mod_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_preserve_linear_at_aot_level_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_preserve_linear_but_not_custom_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_scan_pytree_output_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_script_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_statically_known_true_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_then_compile_tensor_ctor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_autocast_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_fake_tensor_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_inline_constraints_complex_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_inline_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_set_grad_enabled_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_wrong_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_external_call_non_strict_real_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_fake_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_fake_weights_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_filter_traceback_frames_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_float_conversion_from_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_float_conversion_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_fqn_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_from_node_metadata_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_full_on_scalar_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_function_holding_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_hints_wrapper_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_hoo_inline_users_issue_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_if_functional_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_if_post_autograd_op_preserved_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_class_method_recursive_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_class_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_function_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_int_shape_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_intermediate_shape_comp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_is_exporting_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_is_non_negative_check_function_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_is_nonzero_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_isnonzero_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_issue_113041_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_issue_157289_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_istft_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_keep_composite_ops_invalid_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_keep_composite_ops_linear_convd_for_training_ir_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_keep_composite_ops_linear_convd_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_kwarg_dynamic_shapes_diff_order_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_kwargs_reorder_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_layer_norm_unbacked_normalized_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_layer_sharing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_lazy_module_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_lifted_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_linear_conv_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_malformed_fqn_from_source_name_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_map_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_map_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_mask_nonzero_static_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_masked_select_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_math_pow_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_mismatched_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_mixed_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_dict_key_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_input_subclasses_parameterization_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_list_slice_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_with_dict_container_inp_out_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_modules_access_for_deleted_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_more_multidimensional_slicing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_multidimensional_slicing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_multinomial_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_multiple_definitions_same_name_dim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_dynamic_shapes_spec_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_with_constant_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_with_init_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_with_parameter_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nn_module_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nn_module_stack_shared_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_check_is_size_error_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_suggested_fixes_for_data_dependent_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_3_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_persistent_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_strict_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_strict_dynamic_shapes_suggested_fixes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_none_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nonstrict_retrace_preserves_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nonzero_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nonzero_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_not_registered_parameter_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_operator_aten_tensor_mode_variant_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_output_node_name_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_pad_sequence_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_param_util_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_partial_patched_forward_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_collisions_hoo_subgraphs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_collisions_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_order_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_order_variadic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_update_preserving_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_predispatch_cond_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_predispatch_grad_wrappers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_module_call_signature_unflatten_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_requires_grad_placeholders_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_shape_dynamism_for_unused_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_profiling_code_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_python_asserts_with_sym_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_pytree_register_data_class_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_pytree_register_nested_data_class_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_raise_user_error_when_guard_on_data_dependent_operation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_range_constraints_with_replacement_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_alias_dtype_mismatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_bool_cast_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_errors_on_aliasing_custom_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_for_max_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_size_mismatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_redundant_assert_max_upper_bound_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_redundant_asserts_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_refine_dynamic_shapes_from_suggested_fixes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_register_constant_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_repeat_interleave_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_replace_unbacked_with_very_large_upperbound_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_replaced_unbacked_bindings_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_reshape_view_helper_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_retracable_ep_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_retrace_pre_autograd_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_run_decomposition_supports_user_input_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_run_decompositions_keep_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_run_decompositions_keep_tensor_constant_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_runtime_assert_for_prim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_runtime_assert_for_prm_str_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_runtime_assert_with_size_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sdpa_gqa_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sequential_slicing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_example_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_grad_as_side_effect_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_grad_empty_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_grad_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_setgrad_lifted_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_shared_submodule_nn_module_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_simple_export_for_training_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_simple_unbacked_view_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_size_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_slice_nn_module_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_solver_unsupported_sympy_function_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_specialize_derived_dim_roots_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_split_const_gm_with_lifted_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_stack_trace_make_fx_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_stack_trace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_state_primitives_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_state_shape_attribute_assignment_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_state_tensors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_static_dim_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_complicated_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_const_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclasses_parameterization_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclasses_parameterization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggest_torch_checks_with_non_negative_check_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggest_torch_checks_with_regular_check_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggested_fixes_for_data_dependent_errors_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggested_fixes_new_roots_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sym_float_operators_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sym_or_sym_and_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sym_sqrt_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symbool_item_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symfloat_item_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_additional_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_ranges_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_shapes_collection_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_item_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_output_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_tensor_return_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tensor_attribute_zero_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tensor_constant_aten_to_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tensor_constant_with_wrapped_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_to_module_with_mutated_buffer_multiple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_to_module_with_mutated_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tolist_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_torch_check_eq_commutativity_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_torch_fn_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_trace_under_fake_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_train_eval_on_exported_preautograd_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_3d_matmul_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_bincount_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_bindings_for_divisible_u_symint_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_deferred_runtime_retrace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_expand_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_infer_size_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_kth_value_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_linear_layer_norm_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_noncontig_lin_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_pad_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_scalar_constructor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_slice_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_to_cond_passthrough_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_to_cond_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_unsqueeze_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_asserts_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_buffer_update_child2parent_swap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_closure_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_isinstance_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_shared_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_state_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_no_unroll_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_placeholder_update_child2parent_swap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_5_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_6_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_buf_8_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_const_preserving_3_1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_const_preserving_3_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_6_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_9_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_10_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_5_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_7_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_preserving_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unused_aliases_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unused_constant_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_use_embedding_twice_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_user_input_and_buffer_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_vmap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_assert_separation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_index_assertions_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_simple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_tensor_constant_idx_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_wrapper_module_serdes_nonstrict 2025-09-07T07:03:51.7117628Z 2025-09-07T07:03:51.7117775Z Running functorch/test_ops 3/4 ... [2025-09-07 07:03:51.687068] 2025-09-07T07:03:51.7117952Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:03:51.7118352Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'functorch/test_ops.py', '--shard-id=3', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:03:51.687282] 2025-09-07T07:12:46.6470382Z 2025-09-07T07:12:46.6471190Z functorch/test_ops 3/4 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_3.4_4f48a0448d6300b5_.log 2025-09-07T07:12:46.6815444Z Running 2555 items in this shard: test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_l1_loss_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_layer_norm_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_mse_loss_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_bool_raises_floor_cuda_bool, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_amax_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_argmax_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_argmax_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_floor_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_floor_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_ge_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_ge_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_gt_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_le_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_lt_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_lt_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_minimum_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_minimum_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_topk_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_topk_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_tensor_with_scalar_list_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_broadcast_to_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_split_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_vsplit_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_mH_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_movedim_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_narrow_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_narrow_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_positive_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_transpose_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_unflatten_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_unfold_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_H_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_MulGenVmapAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyCubeNotComposableAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ScaleGradGenVmapAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SelectAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SelectGenVmapAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___getitem___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___getitem___functorch_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmul___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__unsafe_masked_index_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__upsample_bilinear2d_aa_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_angle_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_arange_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_asin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_asinh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atan2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atanh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_baddbmm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bfloat16_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bool_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_broadcast_to_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_byte_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cartesian_prod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cauchy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cdouble_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_char_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cholesky_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cholesky_inverse_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_chunk_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_max_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_min_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clone_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_complex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_conj_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_conj_physical_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_copysign_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cosh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_count_nonzero_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumprod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagflat_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_div_trunc_rounding_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_double_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_strided_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_equal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erf_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erfc_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expand_as_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifftshift_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ihfft2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_frac_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_full_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_full_like_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_grid_sampler_3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_gt_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_half_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hstack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_igammac_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_add_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_put_functorch_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_prod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_select_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isclose_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_item_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_binary_return_by_ref_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lgamma_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_det_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_ldl_factor_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lstsq_grad_oriented_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_rank_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_rank_hermitian_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_pinv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_slogdet_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_triangular_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_vander_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linspace_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_softmax_with_dtype_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logcumsumexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logdet_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_and_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_not_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_or_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logspace_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lu_solve_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mT_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_argmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_logsumexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_var_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_matrix_exp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_maximum_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_meshgrid_list_of_tensors_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_meshgrid_variadic_tensors_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mode_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_movedim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nanquantile_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_native_dropout_backward_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_empty_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_full_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_avg_pool1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_bilinear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_cosine_embedding_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_dropout2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_embedding_bag_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_embedding_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_fractional_max_pool3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_glu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardsigmoid_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_linear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_leaky_relu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_logsigmoid_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool2d_grad_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool3d_grad_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_mse_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_mse_loss_functorch_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_nll_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_circular_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_replicate_negative_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_prelu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_rms_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_scaled_dot_product_attention_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_selu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_silu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_smooth_l1_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softmin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softplus_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_triplet_margin_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_upsample_nearest_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_fro_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_nuc_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_number_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ones_like_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ops_aten_index_put_functorch_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pow_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_quantile_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randn_like_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ravel_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reciprocal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_repeat_interleave_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reshape_as_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resolve_neg_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_decimals_0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rsqrt_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_add_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_prod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sgn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_gaussian_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_y0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_erfcx_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_scaled_modified_bessel_k1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sqrt_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_squeeze_multiple_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_stack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_mean_unbiased_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tensor_split_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_to_sparse_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_trace_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_transpose_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unbind_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unflatten_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unfold_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unique_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsafe_split_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsqueeze_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_unbiased_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_view_as_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_where_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_xlogy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_zeros_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32 2025-09-07T07:12:46.7148216Z 2025-09-07T07:12:46.7148334Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-09-07T07:12:46.7148527Z Uploading artifacts took 0.00 seconds 2025-09-07T07:12:46.7148710Z Running inductor/test_auto_functionalize 1/1 ... [2025-09-07 07:12:46.648609] 2025-09-07T07:12:46.7148979Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:12:46.7149404Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_auto_functionalize.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:12:46.648825] 2025-09-07T07:13:07.0036092Z 2025-09-07T07:13:07.0039036Z inductor/test_auto_functionalize 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_auto_functionalize_1.1_7df3aead2303e983_.log 2025-09-07T07:13:07.0045568Z Running 39 items in this shard: test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_alias, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_alias2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_alias2_dynamic, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_alias_id_input_to_custom_op, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_alias_id_output, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_can_with_default, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_can_with_none_return, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_extra1, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_extra2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_extra3, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_extra4, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_extra5, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_old, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_on_view, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_optional_old, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_optional_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_self_as_mutate_arg, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_tensorlist, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_with_returns_old, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_with_returns_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_can_auto_functionalize, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_dynamic2_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_dynamic3_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_dynamic_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_graph_input_is_view, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_inference_mode1_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_inference_mode2_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_inference_mode3_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_inference_mode4_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_inference_mode_view, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_recompile, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_scheduling_with_multiple_mutates, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_slice, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_slice_dynamic, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_split, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_split_dynamic, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_try_use_slice, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_unbacked_auto_functionalize_op 2025-09-07T07:13:07.0051238Z 2025-09-07T07:13:07.0056040Z Running inductor/test_autoheuristic 1/1 ... [2025-09-07 07:13:07.003343] 2025-09-07T07:13:07.0056239Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:13:07.0056750Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_autoheuristic.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:13:07.003578] 2025-09-07T07:13:12.5466641Z 2025-09-07T07:13:12.5467421Z inductor/test_autoheuristic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_autoheuristic_1.1_b49bb06348e7b0ae_.log 2025-09-07T07:13:12.5467800Z Running 0 items in this shard: 2025-09-07T07:13:12.5467879Z 2025-09-07T07:13:12.5469368Z Running inductor/test_benchmark_fusion 1/1 ... [2025-09-07 07:13:12.546556] 2025-09-07T07:13:12.5469560Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:13:12.5474842Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_benchmark_fusion.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:13:12.546890] 2025-09-07T07:13:44.6844094Z 2025-09-07T07:13:44.6851722Z inductor/test_benchmark_fusion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_benchmark_fusion_1.1_8d6af956097f75b4_.log 2025-09-07T07:13:44.6855780Z Running 16 items in this shard: test/inductor/test_benchmark_fusion.py::BenchmarkFusionCudaTest::test_avoid_register_spilling_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCudaTest::test_foreach_kernel_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCudaTest::test_register_spills_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCudaTest::test_resnet18_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCudaTest::test_softmax_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCudaTest::test_tield_kernel_fusion_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkingTest::test_benchmark_on_non_zero_device, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionCudaTest::test_changed_layout, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionCudaTest::test_equivalent_extern_code, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionCudaTest::test_equivalent_template_code, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_avoid_register_spilling_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_foreach_kernel_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_register_spills_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_resnet18_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_softmax_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_tield_kernel_fusion_cpu 2025-09-07T07:13:44.6858910Z 2025-09-07T07:13:44.6859040Z Running inductor/test_compile 1/1 ... [2025-09-07 07:13:44.684321] 2025-09-07T07:13:44.6859265Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:13:44.6859786Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_compile.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:13:44.684554] 2025-09-07T07:13:56.7251078Z 2025-09-07T07:13:56.7256584Z inductor/test_compile 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compile_1.1_fc77b3b63358dc1a_.log 2025-09-07T07:13:56.7258345Z Running 9 items in this shard: test/inductor/test_compile.py::TestStandaloneInductor::test_inductor_generate_debug_symbol, test/inductor/test_compile.py::TestStandaloneInductor::test_inductor_via_bare_module, test/inductor/test_compile.py::TestStandaloneInductor::test_inductor_via_export1, test/inductor/test_compile.py::TestStandaloneInductor::test_inductor_via_export2, test/inductor/test_compile.py::TestStandaloneInductor::test_inductor_via_fx, test/inductor/test_compile.py::TestStandaloneInductor::test_inductor_via_fx_dict_input, test/inductor/test_compile.py::TestStandaloneInductor::test_inductor_via_fx_tensor_return, test/inductor/test_compile.py::TestStandaloneInductor::test_inductor_via_make_fx, test/inductor/test_compile.py::TestStandaloneInductor::test_inductor_via_op_with_multiple_outputs 2025-09-07T07:13:56.7259516Z 2025-09-07T07:13:56.7259630Z Running inductor/test_compile_subprocess 1/2 ... [2025-09-07 07:13:56.725142] 2025-09-07T07:13:56.7259820Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:13:56.7260239Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_compile_subprocess.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:13:56.725332] 2025-09-07T07:19:24.4280677Z 2025-09-07T07:19:24.4281990Z inductor/test_compile_subprocess 1/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compile_subprocess_1.2_42bbdb2e93d8e627_.log 2025-09-07T07:19:24.4337044Z Running 418 items in this shard: test/inductor/test_compile_subprocess.py::TestSubprocess::test_progressive, test/inductor/test_compile_subprocess.py::GPUTests::test_AllenaiLongformerBase_repro_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test__dyn_quant_pack_4bit_weight_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test__unsafe_masked_index_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_abs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adaptive_avg_pool1d_argmax_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adaptive_avg_pool2d2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adaptive_avg_pool_errors_with_long_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adaptive_avg_pool_with_output_size_0_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adaptive_max_pool2d1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adaptive_max_pool2d3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adaptive_pool_errors_with_long_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex10_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_inplace_permuted_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adding_tensor_offsets_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_addmm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_addmv_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_alexnet_prefix_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_allow_reuse_disable_if_exceed_peak_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_angle_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_override_registration_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_support_out_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_with_scalar_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_arange4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_argmax_argmin1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_argmax_argmin2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_argmax_argmin3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_argmax_to_float_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_as_strided_scatter_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_assert_alignment_op_name_fail_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_assert_size_stride_op_name_pass_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d6_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d_backward3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d_backward4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool3d_backward2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool3d_backward3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool3d_backward4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_baddbmm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bernoulli1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bernoulli2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bitwise2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bitwise_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bmm1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bool_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_both_scalars_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_broadcast_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_computed_offsets_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int16_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int32_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int32_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int32_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_nd_tiling_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_buffer_copied_in_graph_with_different_shapes_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_buffer_use_after_remove_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_float_ndigits_neg_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_float_ndigits_zero_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_int_ndigits_pos_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_int_ndigits_zero_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_inplace_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_negative_dim_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_single_empty_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_unbacked_2d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_unbacked_legacy_empty_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_chunk_recompiles_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_clamp_type_promotion_non_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_compar_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_complex_fallback_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_complex_from_real_imag_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_complex_memory_overlap_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_config_option_dont_assume_alignment_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_const_int32_to_float_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_1d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_2d_strides_nonpositive_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_fill_dtype_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv3d_channels_last_use_block_ptr_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv3d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv_backward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv_bn_fuse_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv_inference_heuristics_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv_shape_check_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv_with_as_strided_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_convolution1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_convolution3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_convolution4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cumsum_inf_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cumsum_pattern_matcher_issue_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_op_1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_op_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_op_default_layout_constraint_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_op_fixed_layout_channels_last_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_op_fixed_layout_sequential_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_op_unbacked_symints_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_scan_op_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_scan_op_multi_input_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_data_type_propogation_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dense_mask_index_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_deterministic_codegen_with_suffix_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_diagonal_copy_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div9_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div_by_zero_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div_precision_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div_presicion_accuracy_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div_prim_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div_softmax_symfloat_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dont_constant_fold_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dropout_trivial_1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtype_mismatch_issue_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtype_sympy_expr_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_bfloat16_bfloat16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_bfloat16_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_bfloat16_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_bfloat16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float64_bfloat16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float64_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float64_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float64_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float64_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_fusion_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_bfloat16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_bfloat16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_bfloat16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_elu_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_embedding_bag_byte_unpack_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_embedding_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_empty_strided_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_exact_stride_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_exp2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_exp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_expand_as_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_expand_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fft_real_input_real_output_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fill2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_flip_cat_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_flip_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_float_repr_dynamic_shapes_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_floordiv_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_forced_buffer_realize_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fractional_max_pool2d3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fractional_max_pool2d5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_full_boolean_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_full_like_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_functionalize_rng_wrappers_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_gather1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_gather_scatter_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_gelu_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_getitem_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_arange2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_misaligned_input_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_mutation_real_name_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_scalar_inputs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_hardtanh_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_horizonal_fusion1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put_as_masked_fill_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put_deterministic_fallback_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put_failed_reinplace_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put_index_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put_reinplace_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_remainder_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_indirect_load_broadcast_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inf_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inplace_add_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inplace_resize_as_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_input_mutation2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_input_mutation5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_insignificant_strides_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_int8_weight_only_quant_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_isin_tensor_scalar_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_issue102546_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_kernel_names_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_kwargs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_l1_loss_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_large_block_sizes_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_large_broadcast_reduction_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_large_offset_pointwise_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_large_tensor_reduction_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linspace2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linspace4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_list_clearing_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_log2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_log_fp64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_logaddexp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_logcumsumexp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_logcumsumexp_zero_dim_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_logsumexp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_low_memory_max_pool_dilation_1_dim_3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_low_memory_max_pool_dilation_2_dim_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_low_memory_max_pool_dilation_2_dim_3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_masked_fill_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_masked_scatter_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d6_dilation_1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward6_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mean_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_min_max_reduction_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_misaligned_address_issue1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mix_device_index_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mixed_mm3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mixed_mm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_move_arange_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mul_index_expr_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mul_softmax_symfloat_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multi_gpu_recompile_on_index_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multi_threading_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multilayer_prime_size_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multilayer_var_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nan_sort_stable_True_descending_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nan_sort_stable_True_descending_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_needs_contiguous_strides_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_neg_index_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_new_empty_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_new_empty_strided_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_new_ones_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_no_mega_fusion_during_lowering_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_no_op_reduction_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_norm_constant_overflow_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_one_hot_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pad_single_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pattern_matcher_unbacked_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_permute1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_philox_rand_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_bessel_j0_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_bessel_j1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_bessel_y0_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_bessel_y1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_chebyshev_polynomial_t_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_chebyshev_polynomial_u_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_digamma_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_entr_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_erfc_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_erfinv_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_exp2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_expm1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_gammainc_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_gammaln_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_hermite_polynomial_h_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_laguerre_polynomial_l_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_legendre_polynomial_p_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_log1p_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_modified_bessel_i0_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_modified_bessel_i1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_modified_bessel_k0_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_modified_bessel_k1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_ndtri_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_polygamma_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_round_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_scaled_modified_bessel_k0_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_scaled_modified_bessel_k1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_sinc_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_spherical_bessel_j0_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_xlog1py_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_xlogy_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_zeta_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pow3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pow_int_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pow_symfloat_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_prod_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_profiler_mark_wrapper_call_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_rand_like_deterministic_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_randint_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_randn_generator_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_reduction1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_reduction3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_reduction_config_limit_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_reflection_pad2d_backward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_reflection_pad2d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_reinterpret_dtypeview_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_relu_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remove_no_ops_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_view_default_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_view_dtype_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_repeat_as_strided_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_repeat_interleave_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_repeat_interleave_Tensor_decomp_int64_nd_1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_repeat_interleave_Tensor_decomp_int64_nd_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_repeat_interleave_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_reuse_buffers_with_aliasing_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_rsqrt_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_rsqrt_dynamic_shapes_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter6_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter_add1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter_bf16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter_reduce1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter_reduce3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_searchsorted_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_select_scatter_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_setitem_with_int_parameter_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sgn_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sgn_extremal_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_shape_padding_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_silu_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_size_asserts_for_multi_output_fallback_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_mutation3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_scatter3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_scatter4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_scatter_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_scatter_dtype_consistency_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_view_with_graph_break_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_softmax_backward_data_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_softmax_one_kernel_loop_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sort_bool_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sort_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sort_stable_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sort_transpose_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_special_polygamma_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_cumprod_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_cumsum_low_prec_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_failed_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_with_list_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_with_sizes_with_unbacked_symints_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_std_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_stride_preservation_with_stride_modifying_fx_pass_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_strided_inputs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum_dtype_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum_int_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum_keepdims_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_tanh_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_tensor1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_tensor2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_tensor3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_tensor_index_put_slice_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_tmp_not_defined_issue1_use_block_ptr_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_tmp_not_defined_issue2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_tmp_not_defined_issue3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_to_device_constant_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_to_memory_format_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_topk_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_transpose_add_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_transposed_propagates_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_triton_kernel_bool_param_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_triu_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_uint4x2_mixed_mm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unbacked_floordiv_simplify_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unbacked_floordiv_simplify_errors_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unroll_small_reduction_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unsqueeze_inplace_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_upsample_bilinear2d_a_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_upsample_cat_conv_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_upsample_nearest1d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_var_correction_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_var_mean_tile_reduction_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_vdd_clamp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_view_as_complex_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_view_as_real_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_view_detach_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_views2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_views4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_views5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_views6_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_weight_norm_bwd_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_where_broadcast_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_where_with_logical_op_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_zero_dim_reductions_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_zero_element_mutation_cuda 2025-09-07T07:19:24.4387288Z 2025-09-07T07:19:24.4387373Z Running inductor/test_config 1/1 ... [2025-09-07 07:19:24.429753] 2025-09-07T07:19:24.4387537Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:19:24.4387925Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_config.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:19:24.430012] 2025-09-07T07:19:33.8638138Z 2025-09-07T07:19:33.8639717Z inductor/test_config 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_config_1.1_dd1d4d1a678be280_.log 2025-09-07T07:19:33.8642005Z Running 13 items in this shard: test/inductor/test_config.py::TestInductorConfig::test_api_options, test/inductor/test_config.py::TestInductorConfig::test_codegen_skips_custom_passes, test/inductor/test_config.py::TestInductorConfig::test_compile_api, test/inductor/test_config.py::TestInductorConfig::test_compile_api_passes_config, test/inductor/test_config.py::TestInductorConfig::test_get_compiler_config, test/inductor/test_config.py::TestInductorConfig::test_hasattr, test/inductor/test_config.py::TestInductorConfig::test_invalid_backend, test/inductor/test_config.py::TestInductorConfig::test_invalid_names, test/inductor/test_config.py::TestInductorConfig::test_non_inductor_backend, test/inductor/test_config.py::TestInductorConfig::test_options_do_something, test/inductor/test_config.py::TestInductorConfig::test_patch, test/inductor/test_config.py::TestInductorConfig::test_save_load, test/inductor/test_config.py::TestInductorConfig::test_set 2025-09-07T07:19:33.8644372Z 2025-09-07T07:19:33.8644516Z Running inductor/test_control_flow 1/2 ... [2025-09-07 07:19:33.863745] 2025-09-07T07:19:33.8644768Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:19:33.8645330Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_control_flow.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:19:33.864050] 2025-09-07T07:34:42.1965639Z 2025-09-07T07:34:42.1966359Z PRINTING LOG FILE of inductor/test_control_flow 1/2 (test/test-reports/inductor.test_control_flow_1.2_451e621b9be894b1_.log) 2025-09-07T07:34:42.1967382Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.1973587Z import pkg_resources 2025-09-07T07:34:42.1973895Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-069ee0bd6f9e24d4.xml 2025-09-07T07:34:42.1974272Z ============================= test session starts ============================== 2025-09-07T07:34:42.1974577Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.1974852Z cachedir: .pytest_cache 2025-09-07T07:34:42.1975217Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.1975607Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.1975813Z configfile: pytest.ini 2025-09-07T07:34:42.1976810Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.1977197Z collecting ... collected 467 items 2025-09-07T07:34:42.1977510Z stepcurrent: Cannot find last run test, not skipping 2025-09-07T07:34:42.2029222Z Running 246 items in this shard: test/inductor/test_control_flow.py::CondTests::test_cond_advanced_dynamic_shapes_device_cuda, test/inductor/test_control_flow.py::CondTests::test_cond_aliasing_outputs, test/inductor/test_control_flow.py::CondTests::test_cond_decompose_ops_in_subgraph_device_cpu, test/inductor/test_control_flow.py::CondTests::test_cond_functional_call_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_inductor_fx_passes_recursively_applied, test/inductor/test_control_flow.py::CondTests::test_cond_mismatched_branch_output_size_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_mismatched_branch_output_size_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_mismatched_branch_output_size_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_multiple_outputs_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_multiple_outputs_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_nested_control_flow_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_non_tensor_predicates_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_non_tensor_predicates_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_outer_code_before_after_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_outer_code_before_after_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_outer_code_before_after_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_reintepret_view_inputs_outputs, test/inductor/test_control_flow.py::CondTests::test_cond_select_with_input_idx_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_select_with_input_idx_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_simple_control_flow_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_simple_with_int_closure_device_cpu, test/inductor/test_control_flow.py::CondTests::test_cond_subgraphs_with_parameters_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_subgraphs_with_parameters_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_closure_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_closure_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_closure_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_inner_device_cuda, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_inner_to_outer_device_cpu, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_inner_to_outer_device_cuda, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_outer_to_inner_device_cuda, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_infinite_loop_error, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_stack_output_simple_device_cpu_dynamic_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_stack_output_simple_device_cuda_dynamic_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_stack_output_simple_device_cuda_dynamic_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_buffers_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_zero_loop_device_cpu_dynamic_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_zero_loop_device_cpu_dynamic_True, test/inductor/test_control_flow.py::AssociativeScanTests::test_associative_scan_CUDA_flip_combine_mode_generic_backend_inductor_cpu, test/inductor/test_control_flow.py::AssociativeScanTests::test_associative_scan_CUDA_flip_combine_mode_pointwise_backend_inductor_cpu, test/inductor/test_control_flow.py::AssociativeScanTests::test_associative_scan_CUDA_flip_combine_mode_pointwise_backend_inductor_device_cuda, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cpu_dynamic_False, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cpu_dynamic_True, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cuda_dynamic_False, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cuda_dynamic_True, test/inductor/test_control_flow.py::ScanTests::test_scan_compare_chunked_ce_with_no_scan_device_cpu_dynamic_True, test/inductor/test_control_flow.py::ScanTests::test_scan_compare_chunked_ce_with_no_scan_device_cuda_dynamic_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_True_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_False_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_True_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_1_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_True_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_0_pred_False_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_0_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_False_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_True_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_3_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_False_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_True_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_False_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_True_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_0_pred_False_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_0_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_1_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_1_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_3_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_3_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_False_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_3_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_3_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_0_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_0_pred_True_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_False_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_False_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_True_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_False_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_1_pred_True_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_5, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_False_dim_0, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_True_dim_0, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_True_dim_2, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_False_dim_1, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_True_dim_1, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_True_dim_2, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_False_dim_0, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_False_dim_1, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_True_dim_0, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_False_dim_1, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_0, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_1, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_2, test/inductor/test_control_flow.py::ScanTests::test_scan_with_clamp_device_cuda_dynamic_False, test/inductor/test_control_flow.py::ScanTests::test_scan_with_clamp_device_cuda_dynamic_True, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_pytree_in_out_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_pytree_in_out_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.2067797Z 2025-09-07T07:34:42.2067991Z inductor/test_control_flow.py::CondTests::test_cond_advanced_dynamic_shapes_device_cuda PASSED [2.9010s] [ 0%] 2025-09-07T07:34:42.2068334Z inductor/test_control_flow.py::CondTests::test_cond_aliasing_outputs PASSED [0.3638s] [ 0%] 2025-09-07T07:34:42.2068669Z inductor/test_control_flow.py::CondTests::test_cond_decompose_ops_in_subgraph_device_cpu PASSED [0.5510s] [ 1%] 2025-09-07T07:34:42.2069065Z inductor/test_control_flow.py::CondTests::test_cond_functional_call_device_cpu_dynamic_False PASSED [0.6215s] [ 1%] 2025-09-07T07:34:42.2069491Z inductor/test_control_flow.py::CondTests::test_cond_inductor_fx_passes_recursively_applied PASSED [1.8007s] [ 2%] 2025-09-07T07:34:42.2069897Z inductor/test_control_flow.py::CondTests::test_cond_mismatched_branch_output_size_device_cpu_dynamic_True PASSED [1.8531s] [ 2%] 2025-09-07T07:34:42.2070340Z inductor/test_control_flow.py::CondTests::test_cond_mismatched_branch_output_size_device_cuda_dynamic_False PASSED [1.2006s] [ 2%] 2025-09-07T07:34:42.2070766Z inductor/test_control_flow.py::CondTests::test_cond_mismatched_branch_output_size_device_cuda_dynamic_True PASSED [2.3247s] [ 3%] 2025-09-07T07:34:42.2071214Z inductor/test_control_flow.py::CondTests::test_cond_multiple_outputs_device_cuda_dynamic_False PASSED [0.8817s] [ 3%] 2025-09-07T07:34:42.2071630Z inductor/test_control_flow.py::CondTests::test_cond_multiple_outputs_device_cuda_dynamic_True PASSED [1.9256s] [ 4%] 2025-09-07T07:34:42.2072015Z inductor/test_control_flow.py::CondTests::test_cond_nested_control_flow_device_cuda_dynamic_False PASSED [0.7491s] [ 4%] 2025-09-07T07:34:42.2072420Z inductor/test_control_flow.py::CondTests::test_cond_non_tensor_predicates_device_cuda_dynamic_False PASSED [0.7210s] [ 4%] 2025-09-07T07:34:42.2072806Z inductor/test_control_flow.py::CondTests::test_cond_non_tensor_predicates_device_cuda_dynamic_True PASSED [1.1622s] [ 5%] 2025-09-07T07:34:42.2073216Z inductor/test_control_flow.py::CondTests::test_cond_outer_code_before_after_device_cpu_dynamic_False PASSED [0.4348s] [ 5%] 2025-09-07T07:34:42.2073633Z inductor/test_control_flow.py::CondTests::test_cond_outer_code_before_after_device_cuda_dynamic_False PASSED [0.5587s] [ 6%] 2025-09-07T07:34:42.2074034Z inductor/test_control_flow.py::CondTests::test_cond_outer_code_before_after_device_cuda_dynamic_True PASSED [1.7156s] [ 6%] 2025-09-07T07:34:42.2074487Z inductor/test_control_flow.py::CondTests::test_cond_reintepret_view_inputs_outputs PASSED [1.5288s] [ 6%] 2025-09-07T07:34:42.2074856Z inductor/test_control_flow.py::CondTests::test_cond_select_with_input_idx_device_cpu_dynamic_False PASSED [0.9076s] [ 7%] 2025-09-07T07:34:42.2075229Z inductor/test_control_flow.py::CondTests::test_cond_select_with_input_idx_device_cuda_dynamic_False PASSED [0.4058s] [ 7%] 2025-09-07T07:34:42.2075642Z inductor/test_control_flow.py::CondTests::test_cond_simple_control_flow_device_cpu_dynamic_True PASSED [1.3293s] [ 8%] 2025-09-07T07:34:42.2076020Z inductor/test_control_flow.py::CondTests::test_cond_simple_with_int_closure_device_cpu PASSED [1.3754s] [ 8%] 2025-09-07T07:34:42.2076421Z inductor/test_control_flow.py::CondTests::test_cond_subgraphs_with_parameters_device_cpu_dynamic_False PASSED [1.0205s] [ 8%] 2025-09-07T07:34:42.2076937Z inductor/test_control_flow.py::CondTests::test_cond_subgraphs_with_parameters_device_cpu_dynamic_True PASSED [2.5298s] [ 9%] 2025-09-07T07:34:42.2077383Z inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_closure_device_cpu_dynamic_False PASSED [0.3660s] [ 9%] 2025-09-07T07:34:42.2077752Z inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_closure_device_cuda_dynamic_False PASSED [0.3664s] [ 10%] 2025-09-07T07:34:42.2078170Z inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_closure_device_cuda_dynamic_True PASSED [1.9515s] [ 10%] 2025-09-07T07:34:42.2078652Z inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_inner_device_cuda PASSED [1.4640s] [ 10%] 2025-09-07T07:34:42.2079035Z inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_inner_to_outer_device_cpu PASSED [1.4611s] [ 11%] 2025-09-07T07:34:42.2079414Z inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_inner_to_outer_device_cuda PASSED [1.3806s] [ 11%] 2025-09-07T07:34:42.2079767Z inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_outer_to_inner_device_cuda PASSED [1.4164s] [ 12%] 2025-09-07T07:34:42.2080306Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_infinite_loop_error PASSED [0.0318s] [ 12%] 2025-09-07T07:34:42.2080746Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cpu_dynamic_False_autograd_False PASSED [0.7405s] [ 13%] 2025-09-07T07:34:42.2081204Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cpu_dynamic_True_autograd_False PASSED [1.6587s] [ 13%] 2025-09-07T07:34:42.2106906Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [1.4642s] [ 13%] 2025-09-07T07:34:42.2107520Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [1.0530s] [ 13%] 2025-09-07T07:34:42.2108098Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True FAILED [1.0114s] [ 13%] 2025-09-07T07:34:42.2108388Z 2025-09-07T07:34:42.2108508Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.2108860Z _ WhileLoopTests.test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.2109104Z Traceback (most recent call last): 2025-09-07T07:34:42.2109462Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1248, in test_while_loop_nested_control_flow 2025-09-07T07:34:42.2109804Z self._run_test( 2025-09-07T07:34:42.2110084Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.2110410Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.2110656Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2110949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.2111412Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.2111629Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2112009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.2112397Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.2112632Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2112932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.2113270Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.2113440Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2113767Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.2114153Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.2114451Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2114767Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.2115138Z raise BackendCompilerFailed( 2025-09-07T07:34:42.2115620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.2115961Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2116366Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2116754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.2117094Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2117335Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2117601Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.2117926Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.2118210Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2118536Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.2118872Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.2119071Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2119451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.2119777Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.2120023Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2120474Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.2120758Z return aot_autograd( 2025-09-07T07:34:42.2120952Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.2121289Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.2121618Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.2121881Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2122256Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.2122669Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.2122921Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2123333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.2123694Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.2124125Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.2124472Z fx_g = _create_graph( 2025-09-07T07:34:42.2124667Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2125035Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.2125391Z fx_g = make_fx( 2025-09-07T07:34:42.2125556Z ^^^^^^^^ 2025-09-07T07:34:42.2125943Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.2126304Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.2126625Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2126912Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.2127252Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.2127466Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2127835Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2128186Z t = dispatch_trace( 2025-09-07T07:34:42.2128403Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2128689Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2128984Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2129255Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2129591Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2129909Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2130145Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2130525Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2130850Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2131073Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2131328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2131571Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2131767Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2132042Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2132392Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2132613Z ^^^^^^^^^ 2025-09-07T07:34:42.2132885Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.2133217Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.2133427Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2133723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2134125Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2134345Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2134684Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.2135073Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.2135288Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2135765Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2136121Z outs_pair = fn(*args) 2025-09-07T07:34:42.2136267Z ^^^^^^^^^ 2025-09-07T07:34:42.2136711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.2137121Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.2137466Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2137844Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2138172Z outs_pair = fn(*args) 2025-09-07T07:34:42.2138384Z ^^^^^^^^^ 2025-09-07T07:34:42.2138698Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.2139096Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.2139332Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2139740Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.2140172Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.2140450Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2140803Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2141180Z outs_pair = fn(*args) 2025-09-07T07:34:42.2141356Z ^^^^^^^^^ 2025-09-07T07:34:42.2141696Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2142123Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2142296Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2142684Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.2143081Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.2143280Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2143627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.2143948Z return handle_torch_function( 2025-09-07T07:34:42.2144156Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2144447Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.2144789Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.2145077Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2145456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2145793Z return func(*args, **kwargs) 2025-09-07T07:34:42.2146010Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2146301Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.2146785Z result = _engine_run_backward( 2025-09-07T07:34:42.2146960Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2147320Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.2147751Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2148054Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2148401Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.2148726Z return user_fn(self, *args) 2025-09-07T07:34:42.2148937Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2149249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.2193619Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.2193849Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2194092Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.2194339Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.2194467Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2194688Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2194897Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2195012Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2195254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.2195524Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.2195661Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2195890Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.2196122Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.2196255Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2196590Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.2196844Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.2197021Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2197263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2197501Z t = dispatch_trace( 2025-09-07T07:34:42.2197608Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2197788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2197996Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2198120Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2198339Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2198563Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2198692Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2198931Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2199227Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2199390Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2199591Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2199796Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2199904Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2200120Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2200439Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2200557Z ^^^^^^^^^ 2025-09-07T07:34:42.2200785Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2201037Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2201170Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2201283Z File "", line 1, in 2025-09-07T07:34:42.2201520Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.2201792Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.2201952Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2202199Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.2202434Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.2202618Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2202901Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2203186Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2203305Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2203552Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.2203819Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.2203955Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2204181Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.2204414Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.2204534Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2204742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.2205010Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.2205195Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2205410Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.2205674Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.2205823Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2206311Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.2206616Z leaves = list(leaves) 2025-09-07T07:34:42.2206721Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2206920Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.2207123Z return func(x) 2025-09-07T07:34:42.2207217Z ^^^^^^^ 2025-09-07T07:34:42.2207417Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.2207662Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.2207808Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2208074Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2208333Z return func(*args, **kwargs) 2025-09-07T07:34:42.2208443Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2208710Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.2209026Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.2209150Z 2025-09-07T07:34:42.2209372Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.2209625Z 2025-09-07T07:34:42.2209627Z 2025-09-07T07:34:42.2209702Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.2210017Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.2210252Z 2025-09-07T07:34:42.2210352Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.2210567Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2210723Z inline_call [] 2025-09-07T07:34:42.2210841Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.2211022Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2211262Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2211638Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2212050Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 801, in forward 2025-09-07T07:34:42.2212292Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, (ci, cj, a, b)) 2025-09-07T07:34:42.2212581Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2212869Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2213126Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2213423Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2213708Z _ WhileLoopTests.test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.2213909Z Traceback (most recent call last): 2025-09-07T07:34:42.2214155Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1248, in test_while_loop_nested_control_flow 2025-09-07T07:34:42.2214387Z self._run_test( 2025-09-07T07:34:42.2214562Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.2214828Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.2214969Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2215187Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.2215407Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.2215536Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2215769Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.2216018Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.2216143Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2216366Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.2216678Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.2216811Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2217040Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.2217311Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.2217481Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2217714Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.2217973Z raise BackendCompilerFailed( 2025-09-07T07:34:42.2218218Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.2218475Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2218609Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2218841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.2219084Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2219234Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2219440Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.2219670Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.2219825Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2220085Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.2220318Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.2220465Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2220683Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.2220953Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.2221075Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2221296Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.2221518Z return aot_autograd( 2025-09-07T07:34:42.2221629Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.2221829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.2222086Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.2222246Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2222503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.2222802Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.2222980Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2223308Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.2223584Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.2223848Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.2224123Z fx_g = _create_graph( 2025-09-07T07:34:42.2224236Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2224474Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.2224718Z fx_g = make_fx( 2025-09-07T07:34:42.2224821Z ^^^^^^^^ 2025-09-07T07:34:42.2225035Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.2225283Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.2225413Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2225641Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.2225873Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.2225990Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2226230Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2226689Z t = dispatch_trace( 2025-09-07T07:34:42.2226795Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2226987Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2227188Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2227309Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2227512Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2227728Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2227842Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2228076Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2228362Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2228516Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2228777Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2228991Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2229104Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2229312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2229586Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2229704Z ^^^^^^^^^ 2025-09-07T07:34:42.2229909Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.2230128Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.2230248Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2230475Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2230714Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2230842Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2231071Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.2231334Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.2231491Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2231755Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2232068Z outs_pair = fn(*args) 2025-09-07T07:34:42.2232173Z ^^^^^^^^^ 2025-09-07T07:34:42.2232410Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.2232695Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.2232851Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2233120Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2233384Z outs_pair = fn(*args) 2025-09-07T07:34:42.2233482Z ^^^^^^^^^ 2025-09-07T07:34:42.2233719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.2234003Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.2234154Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2234440Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.2234749Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.2234906Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2235171Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2235424Z outs_pair = fn(*args) 2025-09-07T07:34:42.2235530Z ^^^^^^^^^ 2025-09-07T07:34:42.2235791Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2236082Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2236204Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2236449Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.2236812Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.2236940Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2237149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.2237355Z return handle_torch_function( 2025-09-07T07:34:42.2237523Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2237742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.2238003Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.2238164Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2238422Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2238674Z return func(*args, **kwargs) 2025-09-07T07:34:42.2238784Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2238978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.2239190Z result = _engine_run_backward( 2025-09-07T07:34:42.2239309Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2239533Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.2239846Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2240058Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2240389Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.2240644Z return user_fn(self, *args) 2025-09-07T07:34:42.2240758Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2240980Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.2241212Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.2241334Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2241566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.2241810Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.2241932Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2242139Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2242344Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2242466Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2242706Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.2242969Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.2311975Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2312276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.2312519Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.2312659Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2312910Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.2313165Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.2313293Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2313533Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2313779Z t = dispatch_trace( 2025-09-07T07:34:42.2313881Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2314059Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2314260Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2314382Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2314583Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2314883Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2314992Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2315225Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2315511Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2315671Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2315881Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2316084Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2316191Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2316385Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2316670Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2316781Z ^^^^^^^^^ 2025-09-07T07:34:42.2317003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2317245Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2317364Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2317477Z File "", line 1, in 2025-09-07T07:34:42.2317706Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.2318018Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.2318187Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2318416Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.2318646Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.2318771Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2319039Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2319313Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2319431Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2319676Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.2319935Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.2320055Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2320324Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.2320554Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.2320668Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2320877Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.2321143Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.2321317Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2321534Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.2321759Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.2321900Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2322112Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.2322325Z leaves = list(leaves) 2025-09-07T07:34:42.2322428Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2322620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.2322823Z return func(x) 2025-09-07T07:34:42.2322914Z ^^^^^^^ 2025-09-07T07:34:42.2323152Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.2323401Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.2323545Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2323798Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2324050Z return func(*args, **kwargs) 2025-09-07T07:34:42.2324164Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2324417Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.2324720Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.2324846Z 2025-09-07T07:34:42.2325057Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.2325303Z 2025-09-07T07:34:42.2325306Z 2025-09-07T07:34:42.2325382Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.2325695Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.2325933Z 2025-09-07T07:34:42.2326022Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.2326256Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2326407Z inline_call [] 2025-09-07T07:34:42.2326585Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.2326756Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2326943Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2327314Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2327725Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 801, in forward 2025-09-07T07:34:42.2327968Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, (ci, cj, a, b)) 2025-09-07T07:34:42.2328252Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2328536Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2328790Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2329080Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2329312Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2329457Z inline_call [] 2025-09-07T07:34:42.2329570Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.2329738Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2329923Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2330287Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2330707Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 801, in forward 2025-09-07T07:34:42.2330945Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, (ci, cj, a, b)) 2025-09-07T07:34:42.2331224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2331498Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2331817Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2332110Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2332323Z =================================== FAILURES =================================== 2025-09-07T07:34:42.2332533Z _ WhileLoopTests.test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.2332733Z Traceback (most recent call last): 2025-09-07T07:34:42.2332963Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1248, in test_while_loop_nested_control_flow 2025-09-07T07:34:42.2333190Z self._run_test( 2025-09-07T07:34:42.2333360Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.2333563Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.2333701Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2333918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.2334141Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.2334263Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2334492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.2334771Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.2334896Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2335111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.2335328Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.2335447Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2335665Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.2335932Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.2336092Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2336328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.2336647Z raise BackendCompilerFailed( 2025-09-07T07:34:42.2336877Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.2337125Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2337260Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2337480Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.2337715Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2337843Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2338035Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.2338258Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.2338406Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2338620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.2338854Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.2338994Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2339218Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.2339446Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.2339571Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2339787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.2340045Z return aot_autograd( 2025-09-07T07:34:42.2340149Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.2340346Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.2340594Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.2340748Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2340997Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.2341281Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.2341452Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2341721Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.2341996Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.2342256Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.2342526Z fx_g = _create_graph( 2025-09-07T07:34:42.2342630Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2342860Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.2343155Z fx_g = make_fx( 2025-09-07T07:34:42.2343246Z ^^^^^^^^ 2025-09-07T07:34:42.2343449Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.2343693Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.2343811Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2344037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.2344262Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.2344380Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2344613Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2344848Z t = dispatch_trace( 2025-09-07T07:34:42.2344947Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2345126Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2345315Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2345428Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2345628Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2345835Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2345942Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2346173Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2346455Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2346681Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2346887Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2347090Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2347200Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2347391Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2347597Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2347710Z ^^^^^^^^^ 2025-09-07T07:34:42.2347912Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.2348125Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.2348281Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2348505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2348746Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2348873Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2349099Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.2349364Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.2349510Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2349767Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2350027Z outs_pair = fn(*args) 2025-09-07T07:34:42.2350133Z ^^^^^^^^^ 2025-09-07T07:34:42.2350378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.2350662Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.2350814Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2351077Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2351376Z outs_pair = fn(*args) 2025-09-07T07:34:42.2351481Z ^^^^^^^^^ 2025-09-07T07:34:42.2351725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.2352009Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.2352153Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2352436Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.2352747Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.2352903Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2353167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2353429Z outs_pair = fn(*args) 2025-09-07T07:34:42.2353532Z ^^^^^^^^^ 2025-09-07T07:34:42.2353789Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2354069Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2354191Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2354441Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.2354706Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.2354831Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2355039Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.2355250Z return handle_torch_function( 2025-09-07T07:34:42.2355362Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2355578Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.2355838Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.2356000Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2356257Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2356582Z return func(*args, **kwargs) 2025-09-07T07:34:42.2356699Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2356948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.2357156Z result = _engine_run_backward( 2025-09-07T07:34:42.2357273Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2357494Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.2357804Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2358014Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2358232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.2358442Z return user_fn(self, *args) 2025-09-07T07:34:42.2358551Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2358769Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.2359000Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.2359119Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2359352Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.2359595Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.2359760Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2359960Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2360274Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2360378Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2360613Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.2360874Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.2361007Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2361225Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.2361454Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.2361584Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2361826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.2362082Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.2362206Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2362444Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2362682Z t = dispatch_trace( 2025-09-07T07:34:42.2362777Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2362956Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2363152Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2363270Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2363468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2363672Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2363780Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2364016Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2364300Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2364464Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2364669Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2364875Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2364976Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2365204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2365416Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2365531Z ^^^^^^^^^ 2025-09-07T07:34:42.2365752Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2365994Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2366118Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2366226Z File "", line 1, in 2025-09-07T07:34:42.2366451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.2368942Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.2369117Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2369341Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.2369564Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.2369685Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2369951Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2370285Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2370398Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2370639Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.2370892Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.2371005Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2396867Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.2397158Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.2397283Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2397498Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.2397761Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.2397933Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2398155Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.2398383Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.2398523Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2398731Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.2398938Z leaves = list(leaves) 2025-09-07T07:34:42.2409856Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2410068Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.2410268Z return func(x) 2025-09-07T07:34:42.2410358Z ^^^^^^^ 2025-09-07T07:34:42.2410552Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.2410797Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.2410942Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2411193Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2411440Z return func(*args, **kwargs) 2025-09-07T07:34:42.2411547Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2411795Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.2412181Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.2412308Z 2025-09-07T07:34:42.2413850Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.2414096Z 2025-09-07T07:34:42.2414098Z 2025-09-07T07:34:42.2414172Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.2414484Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.2414757Z 2025-09-07T07:34:42.2414846Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.2415046Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2415193Z inline_call [] 2025-09-07T07:34:42.2415303Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.2415473Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2415657Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2416032Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2416913Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 801, in forward 2025-09-07T07:34:42.2419687Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, (ci, cj, a, b)) 2025-09-07T07:34:42.2419975Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2420249Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2420504Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2420794Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2421022Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2421167Z inline_call [] 2025-09-07T07:34:42.2421274Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.2421438Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2421624Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2421985Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2425465Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 801, in forward 2025-09-07T07:34:42.2451262Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, (ci, cj, a, b)) 2025-09-07T07:34:42.2451541Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2451811Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2452062Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2460584Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2460826Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2460973Z inline_call [] 2025-09-07T07:34:42.2461079Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.2461242Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2461423Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2461859Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2462260Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 801, in forward 2025-09-07T07:34:42.2462492Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, (ci, cj, a, b)) 2025-09-07T07:34:42.2462771Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2463042Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2464501Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2464791Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2465171Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-069ee0bd6f9e24d4.xml - 2025-09-07T07:34:42.2465482Z =========================== short test summary info ============================ 2025-09-07T07:34:42.2465955Z FAILED [1.0114s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.2466575Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.2466698Z 2025-09-07T07:34:42.2466911Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.2467152Z 2025-09-07T07:34:42.2467154Z 2025-09-07T07:34:42.2467230Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.2467538Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.2467768Z 2025-09-07T07:34:42.2467856Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.2468040Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.2470067Z ==================== 1 failed, 33 passed, 2 rerun in 44.12s ==================== 2025-09-07T07:34:42.2470207Z Got exit code 1 2025-09-07T07:34:42.2470298Z Retrying single test... 2025-09-07T07:34:42.2470805Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.2471299Z import pkg_resources 2025-09-07T07:34:42.2471531Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-d5000ca64d57adb0.xml 2025-09-07T07:34:42.2471791Z ============================= test session starts ============================== 2025-09-07T07:34:42.2472000Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.2472189Z cachedir: .pytest_cache 2025-09-07T07:34:42.2472408Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.2472640Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.2473916Z configfile: pytest.ini 2025-09-07T07:34:42.2474144Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.2474416Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.2474812Z stepcurrent: skipping 33 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.2475119Z Running 1 items in this shard 2025-09-07T07:34:42.2475190Z 2025-09-07T07:34:42.2475391Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [1.3841s] [100%] 2025-09-07T07:34:42.2475824Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [1.1164s] [100%] 2025-09-07T07:34:42.2496611Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True FAILED [1.2635s] [100%] 2025-09-07T07:34:42.2496829Z 2025-09-07T07:34:42.2496881Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.2497087Z _ WhileLoopTests.test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.2499050Z Traceback (most recent call last): 2025-09-07T07:34:42.2499298Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1248, in test_while_loop_nested_control_flow 2025-09-07T07:34:42.2499532Z self._run_test( 2025-09-07T07:34:42.2499700Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.2499972Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.2500105Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2500319Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.2500543Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.2500664Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2500896Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.2501138Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.2502465Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2502682Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.2502898Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.2503018Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2503235Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.2503495Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.2503649Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2503877Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.2504114Z raise BackendCompilerFailed( 2025-09-07T07:34:42.2504338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.2504577Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2505810Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2506032Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.2506266Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2506390Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2506678Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.2506896Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.2507042Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2507295Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.2507524Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.2507665Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2507885Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.2509302Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.2509420Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2509632Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.2509846Z return aot_autograd( 2025-09-07T07:34:42.2509943Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.2510142Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.2510386Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.2510536Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2510780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.2511061Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.2511224Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2533658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.2533935Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.2534205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.2534474Z fx_g = _create_graph( 2025-09-07T07:34:42.2534570Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2534802Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.2535039Z fx_g = make_fx( 2025-09-07T07:34:42.2535128Z ^^^^^^^^ 2025-09-07T07:34:42.2535333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.2535583Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.2535704Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2537849Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.2538079Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.2538193Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2538428Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2538660Z t = dispatch_trace( 2025-09-07T07:34:42.2538754Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2538928Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2539122Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2539234Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2539432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2539643Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2541389Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2541619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2541900Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2542055Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2542319Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2542521Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2542621Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2542812Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2543021Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2543131Z ^^^^^^^^^ 2025-09-07T07:34:42.2543333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.2545636Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.2545746Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2545965Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2546199Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2546322Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2546657Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.2546915Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.2547057Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2547316Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2547628Z outs_pair = fn(*args) 2025-09-07T07:34:42.2547724Z ^^^^^^^^^ 2025-09-07T07:34:42.2549169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.2549448Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.2575723Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2576032Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2576292Z outs_pair = fn(*args) 2025-09-07T07:34:42.2576395Z ^^^^^^^^^ 2025-09-07T07:34:42.2576717Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.2577030Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.2577234Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2577689Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.2578053Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.2580121Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2580433Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2580689Z outs_pair = fn(*args) 2025-09-07T07:34:42.2580788Z ^^^^^^^^^ 2025-09-07T07:34:42.2581044Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2581323Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2581462Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2581709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.2582006Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.2582127Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2583914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.2584134Z return handle_torch_function( 2025-09-07T07:34:42.2584305Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2584520Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.2584811Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.2584966Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2585218Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2585467Z return func(*args, **kwargs) 2025-09-07T07:34:42.2585574Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2585764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.2585964Z result = _engine_run_backward( 2025-09-07T07:34:42.2587410Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2587646Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.2587987Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2588222Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2588438Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.2588701Z return user_fn(self, *args) 2025-09-07T07:34:42.2588805Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2589017Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.2589240Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.2589351Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2589580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.2592356Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.2592485Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2592684Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2592883Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2592984Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2613817Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.2614071Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.2614196Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2614412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.2614632Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.2614755Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2616196Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.2616444Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.2616640Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2616872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2617108Z t = dispatch_trace( 2025-09-07T07:34:42.2617202Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2617371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2617563Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2617674Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2617868Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2618066Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2619294Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2619527Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2619802Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2619956Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2620160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2620357Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2620456Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2620643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2620847Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2620952Z ^^^^^^^^^ 2025-09-07T07:34:42.2621164Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2622451Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2622572Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2622669Z File "", line 1, in 2025-09-07T07:34:42.2622888Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.2623143Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.2623375Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2623594Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.2623813Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.2623931Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2624198Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2624468Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2625626Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2625870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.2626120Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.2626237Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2626450Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.2626794Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.2626903Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2627104Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.2627361Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.2627530Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2627739Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.2629028Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.2629164Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2629370Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.2629573Z leaves = list(leaves) 2025-09-07T07:34:42.2629669Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2629850Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.2630042Z return func(x) 2025-09-07T07:34:42.2630130Z ^^^^^^^ 2025-09-07T07:34:42.2630366Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.2630604Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.2630741Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2632085Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2632333Z return func(*args, **kwargs) 2025-09-07T07:34:42.2632440Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2632687Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.2632989Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.2633111Z 2025-09-07T07:34:42.2633320Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.2633562Z 2025-09-07T07:34:42.2633565Z 2025-09-07T07:34:42.2633640Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.2633947Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.2634181Z 2025-09-07T07:34:42.2634270Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.2634516Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2634663Z inline_call [] 2025-09-07T07:34:42.2634769Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.2635918Z inductor [] 2025-09-07T07:34:42.2636042Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2636221Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2636676Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2637081Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 801, in forward 2025-09-07T07:34:42.2637317Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, (ci, cj, a, b)) 2025-09-07T07:34:42.2637593Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2637869Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2638122Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2638405Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2640266Z _ WhileLoopTests.test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.2640466Z Traceback (most recent call last): 2025-09-07T07:34:42.2640696Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1248, in test_while_loop_nested_control_flow 2025-09-07T07:34:42.2640915Z self._run_test( 2025-09-07T07:34:42.2641080Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.2641280Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.2641412Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2641621Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.2641834Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.2641952Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2642177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.2643401Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.2643570Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2643780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.2643995Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.2644108Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2644321Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.2644582Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.2644737Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2644962Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.2645195Z raise BackendCompilerFailed( 2025-09-07T07:34:42.2645420Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.2646703Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2646834Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2647051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.2647278Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2647451Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2647640Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.2647857Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.2647998Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2648206Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.2648275Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.2648316Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2648458Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.2648502Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.2648539Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2649653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.2649698Z return aot_autograd( 2025-09-07T07:34:42.2649733Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.2649870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.2649939Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.2649985Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2650147Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.2650232Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.2650276Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2650458Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.2650505Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.2650691Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.2650731Z fx_g = _create_graph( 2025-09-07T07:34:42.2650766Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2650930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.2651007Z fx_g = make_fx( 2025-09-07T07:34:42.2651039Z ^^^^^^^^ 2025-09-07T07:34:42.2651192Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.2651239Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.2651275Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2651428Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.2651470Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.2652459Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2652618Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2652656Z t = dispatch_trace( 2025-09-07T07:34:42.2652689Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2652805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2652847Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2652883Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2653008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2653049Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2653084Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2653288Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2653367Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2653409Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2653533Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2653572Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2653607Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2653734Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2653775Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2653809Z ^^^^^^^^^ 2025-09-07T07:34:42.2653943Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.2653987Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.2654960Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2655118Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2655168Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2655201Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2655358Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.2655422Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.2655467Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2655643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2655683Z outs_pair = fn(*args) 2025-09-07T07:34:42.2655717Z ^^^^^^^^^ 2025-09-07T07:34:42.2655896Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.2655961Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.2656006Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2656178Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2656217Z outs_pair = fn(*args) 2025-09-07T07:34:42.2656293Z ^^^^^^^^^ 2025-09-07T07:34:42.2656555Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.2656614Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.2656657Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2656850Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.2656925Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.2656971Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2658085Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2658124Z outs_pair = fn(*args) 2025-09-07T07:34:42.2658161Z ^^^^^^^^^ 2025-09-07T07:34:42.2658356Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2658402Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2658437Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2658608Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.2658704Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.2658740Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2658866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.2658909Z return handle_torch_function( 2025-09-07T07:34:42.2658943Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2659085Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.2659164Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.2659209Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2659376Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2659422Z return func(*args, **kwargs) 2025-09-07T07:34:42.2659458Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2659581Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.2659622Z result = _engine_run_backward( 2025-09-07T07:34:42.2659657Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2660727Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.2660851Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2660900Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2661026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.2661068Z return user_fn(self, *args) 2025-09-07T07:34:42.2661103Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2661253Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.2661296Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.2661332Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2661490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.2661534Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.2661569Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2661734Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2661774Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2661809Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2661974Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.2662030Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.2662069Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2662207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.2662255Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.2662292Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2662455Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.2663415Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.2663454Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2663613Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2663651Z t = dispatch_trace( 2025-09-07T07:34:42.2663684Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2663838Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2663879Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2663916Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2664039Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2664077Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2664111Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2664276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2664354Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2664394Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2664520Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2664565Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2664598Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2664725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2664766Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2664800Z ^^^^^^^^^ 2025-09-07T07:34:42.2664948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2665914Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2665950Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2665991Z File "", line 1, in 2025-09-07T07:34:42.2666135Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.2666213Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.2666257Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2666397Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.2666445Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.2666560Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2666753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2666795Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2666877Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2667048Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.2667093Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.2667129Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2667272Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.2667317Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.2667353Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2667487Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.2667576Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.2667621Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2667749Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.2668738Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.2668783Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2668909Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.2668992Z leaves = list(leaves) 2025-09-07T07:34:42.2669026Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2669149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.2669184Z return func(x) 2025-09-07T07:34:42.2669216Z ^^^^^^^ 2025-09-07T07:34:42.2669353Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.2669417Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.2669461Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2669629Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2669669Z return func(*args, **kwargs) 2025-09-07T07:34:42.2669704Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2669886Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.2669977Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.2669980Z 2025-09-07T07:34:42.2670189Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.2670191Z 2025-09-07T07:34:42.2670193Z 2025-09-07T07:34:42.2670266Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.2670465Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.2670468Z 2025-09-07T07:34:42.2670554Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.2670629Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2671597Z inline_call [] 2025-09-07T07:34:42.2671654Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.2671689Z inductor [] 2025-09-07T07:34:42.2671763Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2671835Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2672094Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2672240Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 801, in forward 2025-09-07T07:34:42.2672328Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, (ci, cj, a, b)) 2025-09-07T07:34:42.2672479Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2672563Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2672702Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2672821Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2672891Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2672926Z inline_call [] 2025-09-07T07:34:42.2672981Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.2673056Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2673124Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2673379Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2673489Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 801, in forward 2025-09-07T07:34:42.2673619Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, (ci, cj, a, b)) 2025-09-07T07:34:42.2673770Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2673853Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2674909Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2675032Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2675081Z =================================== FAILURES =================================== 2025-09-07T07:34:42.2675194Z _ WhileLoopTests.test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.2675237Z Traceback (most recent call last): 2025-09-07T07:34:42.2675395Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1248, in test_while_loop_nested_control_flow 2025-09-07T07:34:42.2675430Z self._run_test( 2025-09-07T07:34:42.2675542Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.2675596Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.2675636Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2675769Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.2675818Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.2675856Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2676008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.2676054Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.2676093Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2676233Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.2676277Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.2676314Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2676458Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.2676605Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.2677623Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2677778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.2677824Z raise BackendCompilerFailed( 2025-09-07T07:34:42.2677976Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.2678035Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2678077Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2678220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.2678270Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2678308Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2678425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.2678491Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.2678537Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2678662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.2678726Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.2678766Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2678944Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.2678988Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.2679025Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2679162Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.2679201Z return aot_autograd( 2025-09-07T07:34:42.2679235Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.2679373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.2680437Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.2680486Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2680647Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.2680737Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.2680781Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2680967Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.2681008Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.2681199Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.2681238Z fx_g = _create_graph( 2025-09-07T07:34:42.2681273Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2681436Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.2681470Z fx_g = make_fx( 2025-09-07T07:34:42.2681505Z ^^^^^^^^ 2025-09-07T07:34:42.2681658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.2681703Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.2681741Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2681887Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.2681930Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.2681999Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2682158Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2682196Z t = dispatch_trace( 2025-09-07T07:34:42.2682229Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2683273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2683317Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2683353Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2683479Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2683519Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2683554Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2683718Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2683798Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2683839Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2683962Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2684000Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2684034Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2684160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2684235Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2684270Z ^^^^^^^^^ 2025-09-07T07:34:42.2684401Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.2684442Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.2684476Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2684626Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2684676Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2684709Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2685787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.2685852Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.2685900Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2686078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2686117Z outs_pair = fn(*args) 2025-09-07T07:34:42.2686151Z ^^^^^^^^^ 2025-09-07T07:34:42.2686325Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.2686396Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.2686441Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2686676Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2686716Z outs_pair = fn(*args) 2025-09-07T07:34:42.2686749Z ^^^^^^^^^ 2025-09-07T07:34:42.2686927Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.2686987Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.2687029Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2687223Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.2687341Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.2687387Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2687561Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2687598Z outs_pair = fn(*args) 2025-09-07T07:34:42.2687632Z ^^^^^^^^^ 2025-09-07T07:34:42.2687823Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2688797Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2688833Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2689003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.2689047Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.2689084Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2689212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.2689254Z return handle_torch_function( 2025-09-07T07:34:42.2689289Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2689434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.2689552Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.2689598Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2689766Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2689806Z return func(*args, **kwargs) 2025-09-07T07:34:42.2689841Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2689964Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.2690006Z result = _engine_run_backward( 2025-09-07T07:34:42.2690041Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2690187Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.2690307Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2690360Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2690485Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.2691442Z return user_fn(self, *args) 2025-09-07T07:34:42.2691478Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2691623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.2691666Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.2691705Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2691862Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.2691907Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.2691942Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2692066Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2692110Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2692145Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2692316Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.2692369Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.2692407Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2692578Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.2692627Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.2692665Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2692827Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.2692874Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.2692915Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2693077Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2693115Z t = dispatch_trace( 2025-09-07T07:34:42.2694075Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2694189Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2694232Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2694270Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2694394Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2694432Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2694466Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2694627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2694749Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2694790Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2694913Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2694951Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2694984Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2695112Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2695154Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2695188Z ^^^^^^^^^ 2025-09-07T07:34:42.2695341Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2695390Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2695423Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2695468Z File "", line 1, in 2025-09-07T07:34:42.2695610Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.2696688Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.2696734Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2696872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.2696920Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.2696958Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2697149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2697192Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2697226Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2697402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.2697446Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.2697481Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2697624Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.2697665Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.2697700Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2697878Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.2697968Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.2698013Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2698140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.2698203Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.2698246Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2698372Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.2698410Z leaves = list(leaves) 2025-09-07T07:34:42.2699856Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2699983Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.2700019Z return func(x) 2025-09-07T07:34:42.2700052Z ^^^^^^^ 2025-09-07T07:34:42.2700190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.2700256Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.2700296Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2700513Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2700553Z return func(*args, **kwargs) 2025-09-07T07:34:42.2700588Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2700768Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.2700854Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.2700857Z 2025-09-07T07:34:42.2701066Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.2701069Z 2025-09-07T07:34:42.2701070Z 2025-09-07T07:34:42.2701143Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.2701340Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.2701346Z 2025-09-07T07:34:42.2701432Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.2701506Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2701541Z inline_call [] 2025-09-07T07:34:42.2701597Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.2701631Z inductor [] 2025-09-07T07:34:42.2701706Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2702713Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2702972Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2703084Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 801, in forward 2025-09-07T07:34:42.2703175Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, (ci, cj, a, b)) 2025-09-07T07:34:42.2703327Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2703413Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2703548Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2703696Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2703768Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2703801Z inline_call [] 2025-09-07T07:34:42.2703858Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.2703931Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2704004Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2704259Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2704368Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 801, in forward 2025-09-07T07:34:42.2704453Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, (ci, cj, a, b)) 2025-09-07T07:34:42.2704603Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2704688Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2704819Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2704937Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2705039Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2705074Z inline_call [] 2025-09-07T07:34:42.2706047Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.2706120Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2706189Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2706443Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2706656Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 801, in forward 2025-09-07T07:34:42.2706741Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, (ci, cj, a, b)) 2025-09-07T07:34:42.2706889Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2706978Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2707106Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2707224Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2707443Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-d5000ca64d57adb0.xml - 2025-09-07T07:34:42.2707500Z =========================== short test summary info ============================ 2025-09-07T07:34:42.2707869Z FAILED [1.2635s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.2707958Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.2707960Z 2025-09-07T07:34:42.2708170Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.2708173Z 2025-09-07T07:34:42.2708175Z 2025-09-07T07:34:42.2708246Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.2708491Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.2708494Z 2025-09-07T07:34:42.2708578Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.2708638Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.2708704Z ================== 1 failed, 245 deselected, 2 rerun in 3.94s ================== 2025-09-07T07:34:42.2708741Z Got exit code 1 2025-09-07T07:34:42.2709722Z Retrying single test... 2025-09-07T07:34:42.2710154Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.2710192Z import pkg_resources 2025-09-07T07:34:42.2710379Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-0f640ae1a7e40d6a.xml 2025-09-07T07:34:42.2710440Z ============================= test session starts ============================== 2025-09-07T07:34:42.2710554Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.2710592Z cachedir: .pytest_cache 2025-09-07T07:34:42.2710790Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.2710834Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.2710873Z configfile: pytest.ini 2025-09-07T07:34:42.2711034Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.2711111Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.2711347Z stepcurrent: skipping 33 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.2711390Z Running 1 items in this shard 2025-09-07T07:34:42.2711392Z 2025-09-07T07:34:42.2711590Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [1.3748s] [100%] 2025-09-07T07:34:42.2711791Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [1.2543s] [100%] 2025-09-07T07:34:42.2711963Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True FAILED [1.4182s] [100%] 2025-09-07T07:34:42.2711965Z 2025-09-07T07:34:42.2712014Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.2712127Z _ WhileLoopTests.test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.2712170Z Traceback (most recent call last): 2025-09-07T07:34:42.2712324Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1248, in test_while_loop_nested_control_flow 2025-09-07T07:34:42.2713278Z self._run_test( 2025-09-07T07:34:42.2713391Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.2713453Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.2713493Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2713629Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.2713675Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.2713714Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2713866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.2713951Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.2713990Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2714130Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.2714174Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.2714211Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2714359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.2714440Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.2714479Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2714630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.2714677Z raise BackendCompilerFailed( 2025-09-07T07:34:42.2714829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.2714882Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2714921Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2715064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.2716076Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2716116Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2716232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.2716299Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.2716342Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2716468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.2716622Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.2716664Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2716805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.2716848Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.2716884Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2717031Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.2717068Z return aot_autograd( 2025-09-07T07:34:42.2717103Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.2717240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.2717310Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.2717360Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2717520Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.2717605Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.2717649Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2717835Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.2718805Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.2718991Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.2719031Z fx_g = _create_graph( 2025-09-07T07:34:42.2719065Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2719273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.2719309Z fx_g = make_fx( 2025-09-07T07:34:42.2719340Z ^^^^^^^^ 2025-09-07T07:34:42.2719492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.2719537Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.2719575Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2719723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.2719765Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.2719801Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2719960Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2719996Z t = dispatch_trace( 2025-09-07T07:34:42.2720029Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2720199Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2720241Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2720276Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2720402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2720441Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2720516Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2721607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2721689Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2721729Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2721854Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2721895Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2721928Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2722060Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2722100Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2722135Z ^^^^^^^^^ 2025-09-07T07:34:42.2722267Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.2722313Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.2722347Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2722497Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2722545Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2722579Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2722737Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.2722799Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.2722844Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2723022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2723059Z outs_pair = fn(*args) 2025-09-07T07:34:42.2723097Z ^^^^^^^^^ 2025-09-07T07:34:42.2723268Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.2724252Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.2724296Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2724470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2724542Z outs_pair = fn(*args) 2025-09-07T07:34:42.2724577Z ^^^^^^^^^ 2025-09-07T07:34:42.2724753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.2724813Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.2724854Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2725050Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.2725121Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.2725169Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2725344Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2725383Z outs_pair = fn(*args) 2025-09-07T07:34:42.2725416Z ^^^^^^^^^ 2025-09-07T07:34:42.2725607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2725653Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2725688Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2725896Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.2725941Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.2725977Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2726103Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.2727125Z return handle_torch_function( 2025-09-07T07:34:42.2727162Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2727308Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.2727382Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.2727428Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2727595Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2727642Z return func(*args, **kwargs) 2025-09-07T07:34:42.2727677Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2727801Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.2727841Z result = _engine_run_backward( 2025-09-07T07:34:42.2727876Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2728022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.2728146Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2728195Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2728321Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.2728362Z return user_fn(self, *args) 2025-09-07T07:34:42.2728400Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2728544Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.2728589Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.2728624Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2728783Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.2728826Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.2729826Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2729953Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2729992Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2730026Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2730191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.2730245Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.2730284Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2730424Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.2730472Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.2730513Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2730680Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.2730727Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.2730765Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2730927Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2730964Z t = dispatch_trace( 2025-09-07T07:34:42.2731040Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2731153Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2731195Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2731229Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2731353Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2731391Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2732351Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2732517Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2732596Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2732635Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2732762Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2732804Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2732838Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2732966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2733006Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2733040Z ^^^^^^^^^ 2025-09-07T07:34:42.2733189Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2733241Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2733273Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2733316Z File "", line 1, in 2025-09-07T07:34:42.2733459Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.2733537Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.2733585Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2733723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.2733769Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.2733806Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2733997Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2734080Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2735033Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2735206Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.2735248Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.2735285Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2735430Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.2735472Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.2735506Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2735642Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.2735729Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.2735777Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2735901Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.2735960Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.2736002Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2736128Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.2736203Z leaves = list(leaves) 2025-09-07T07:34:42.2736236Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2736360Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.2736394Z return func(x) 2025-09-07T07:34:42.2736426Z ^^^^^^^ 2025-09-07T07:34:42.2736628Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.2736695Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.2736735Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2737835Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2737875Z return func(*args, **kwargs) 2025-09-07T07:34:42.2737911Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2738093Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.2738183Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.2738185Z 2025-09-07T07:34:42.2738392Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.2738395Z 2025-09-07T07:34:42.2738397Z 2025-09-07T07:34:42.2738470Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.2738665Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.2738669Z 2025-09-07T07:34:42.2738753Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.2738827Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2738865Z inline_call [] 2025-09-07T07:34:42.2738921Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.2738955Z inductor [] 2025-09-07T07:34:42.2739028Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2739099Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2739397Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2739510Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 801, in forward 2025-09-07T07:34:42.2739595Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, (ci, cj, a, b)) 2025-09-07T07:34:42.2739747Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2739833Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2739965Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2741003Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2741116Z _ WhileLoopTests.test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.2741158Z Traceback (most recent call last): 2025-09-07T07:34:42.2741313Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1248, in test_while_loop_nested_control_flow 2025-09-07T07:34:42.2741347Z self._run_test( 2025-09-07T07:34:42.2741459Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.2741515Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.2741555Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2741728Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.2741774Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.2741811Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2741962Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.2742006Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.2742044Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2742180Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.2742223Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.2742260Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2742401Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.2742485Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.2742522Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2742674Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.2743633Z raise BackendCompilerFailed( 2025-09-07T07:34:42.2743784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.2743838Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2743877Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2744018Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.2744068Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2744106Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2744228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.2744293Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.2744336Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2744461Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.2744524Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.2746047Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2746188Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.2746232Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.2746268Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2746405Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.2746447Z return aot_autograd( 2025-09-07T07:34:42.2746549Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.2746686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.2746755Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.2746800Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2747903Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.2747988Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.2748032Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2748216Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.2748310Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.2748498Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.2748535Z fx_g = _create_graph( 2025-09-07T07:34:42.2748570Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2748733Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.2748768Z fx_g = make_fx( 2025-09-07T07:34:42.2748800Z ^^^^^^^^ 2025-09-07T07:34:42.2748953Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.2748997Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.2749034Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2749182Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.2749229Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.2749266Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2749423Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2749461Z t = dispatch_trace( 2025-09-07T07:34:42.2749493Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2749607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2749648Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2750619Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2750745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2750784Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2750819Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2750981Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2751063Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2751103Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2751227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2751265Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2751299Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2751467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2751509Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2751543Z ^^^^^^^^^ 2025-09-07T07:34:42.2751677Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.2751716Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.2751754Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2751905Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2751954Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2751987Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2752143Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.2752204Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.2753176Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2753352Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2753391Z outs_pair = fn(*args) 2025-09-07T07:34:42.2753425Z ^^^^^^^^^ 2025-09-07T07:34:42.2753599Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.2753701Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.2753745Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2753917Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2753956Z outs_pair = fn(*args) 2025-09-07T07:34:42.2753988Z ^^^^^^^^^ 2025-09-07T07:34:42.2754167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.2754226Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.2754268Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2754467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.2754541Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.2754585Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2754758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2754795Z outs_pair = fn(*args) 2025-09-07T07:34:42.2754829Z ^^^^^^^^^ 2025-09-07T07:34:42.2755020Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2755065Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2755100Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2756192Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.2756243Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.2756280Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2756406Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.2756448Z return handle_torch_function( 2025-09-07T07:34:42.2756537Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2756679Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.2756804Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.2756849Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2757016Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2757055Z return func(*args, **kwargs) 2025-09-07T07:34:42.2757091Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2757218Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.2757260Z result = _engine_run_backward( 2025-09-07T07:34:42.2757294Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2757440Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.2757560Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2757611Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2757737Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.2757778Z return user_fn(self, *args) 2025-09-07T07:34:42.2757813Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2758891Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.2758982Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.2759018Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2759175Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.2759219Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.2759254Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2759380Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2759418Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2759453Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2759616Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.2759667Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.2759711Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2759846Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.2759894Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.2759931Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2760094Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.2760184Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.2760224Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2760385Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2760423Z t = dispatch_trace( 2025-09-07T07:34:42.2760455Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2760570Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2761541Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2761578Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2761701Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2761740Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2761774Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2761934Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2762048Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2762089Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2762213Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2762251Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2762284Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2762416Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2762456Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2762490Z ^^^^^^^^^ 2025-09-07T07:34:42.2762639Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2762687Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2762720Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2762762Z File "", line 1, in 2025-09-07T07:34:42.2762905Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.2762982Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.2763026Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2763162Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.2764182Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.2764218Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2764409Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2764452Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2764487Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2764660Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.2764705Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.2764740Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2764884Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.2764928Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.2764963Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2765095Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.2765182Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.2765226Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2765353Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.2765412Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.2765455Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2765580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.2765618Z leaves = list(leaves) 2025-09-07T07:34:42.2765651Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2765778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.2766781Z return func(x) 2025-09-07T07:34:42.2766815Z ^^^^^^^ 2025-09-07T07:34:42.2766954Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.2767019Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.2767059Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2767275Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2767316Z return func(*args, **kwargs) 2025-09-07T07:34:42.2767351Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2767532Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.2767622Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.2767624Z 2025-09-07T07:34:42.2767832Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.2767834Z 2025-09-07T07:34:42.2767836Z 2025-09-07T07:34:42.2767909Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.2768107Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.2768110Z 2025-09-07T07:34:42.2768195Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.2768268Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2768303Z inline_call [] 2025-09-07T07:34:42.2768358Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.2768437Z inductor [] 2025-09-07T07:34:42.2768511Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2768582Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2768842Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2769882Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 801, in forward 2025-09-07T07:34:42.2769970Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, (ci, cj, a, b)) 2025-09-07T07:34:42.2770122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2770210Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2770342Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2770467Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2770538Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2770572Z inline_call [] 2025-09-07T07:34:42.2770628Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.2770700Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2770770Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2771025Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2771134Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 801, in forward 2025-09-07T07:34:42.2771222Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, (ci, cj, a, b)) 2025-09-07T07:34:42.2771373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2771456Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2771585Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2771738Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2771788Z =================================== FAILURES =================================== 2025-09-07T07:34:42.2771901Z _ WhileLoopTests.test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.2771944Z Traceback (most recent call last): 2025-09-07T07:34:42.2772096Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1248, in test_while_loop_nested_control_flow 2025-09-07T07:34:42.2773059Z self._run_test( 2025-09-07T07:34:42.2773171Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.2773225Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.2773265Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2773398Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.2773444Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.2773484Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2773635Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.2773680Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.2773719Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2773854Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.2773948Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.2773984Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2774126Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.2774207Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.2774245Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2774398Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.2774443Z raise BackendCompilerFailed( 2025-09-07T07:34:42.2774593Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.2774644Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2774686Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2774828Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.2775809Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2775847Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2775962Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.2776027Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.2776073Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2776199Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.2776262Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.2776302Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2776447Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.2776561Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.2776599Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2776736Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.2776775Z return aot_autograd( 2025-09-07T07:34:42.2776809Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.2776988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.2777057Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.2777103Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2777262Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.2777349Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.2777392Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2777574Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.2778544Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.2778735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.2778773Z fx_g = _create_graph( 2025-09-07T07:34:42.2778808Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2778971Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.2779005Z fx_g = make_fx( 2025-09-07T07:34:42.2779037Z ^^^^^^^^ 2025-09-07T07:34:42.2779235Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.2779280Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.2779318Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2779466Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.2779508Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.2779544Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2779703Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2779740Z t = dispatch_trace( 2025-09-07T07:34:42.2779773Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2779886Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2779926Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2779968Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2780091Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2780130Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2780165Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2781242Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2781321Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2781364Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2781488Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2781527Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2781561Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2781687Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2781732Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2781766Z ^^^^^^^^^ 2025-09-07T07:34:42.2781898Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.2781939Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.2781974Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2782123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2782205Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2782238Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2782396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.2782457Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.2782501Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2782680Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2782719Z outs_pair = fn(*args) 2025-09-07T07:34:42.2782753Z ^^^^^^^^^ 2025-09-07T07:34:42.2783841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.2783908Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.2783955Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2784128Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2784167Z outs_pair = fn(*args) 2025-09-07T07:34:42.2784200Z ^^^^^^^^^ 2025-09-07T07:34:42.2784377Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.2784472Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.2784514Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2784708Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.2784778Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.2784825Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2784999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2785037Z outs_pair = fn(*args) 2025-09-07T07:34:42.2785071Z ^^^^^^^^^ 2025-09-07T07:34:42.2785260Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2785310Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2785345Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2785514Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.2785559Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.2785596Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2785722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.2786751Z return handle_torch_function( 2025-09-07T07:34:42.2786788Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2786930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.2787004Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.2787052Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2787220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2787261Z return func(*args, **kwargs) 2025-09-07T07:34:42.2787297Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2787420Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.2787461Z result = _engine_run_backward( 2025-09-07T07:34:42.2787544Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2787691Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.2787811Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2787862Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2787990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.2788031Z return user_fn(self, *args) 2025-09-07T07:34:42.2788067Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2788212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.2788254Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.2788290Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2788449Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.2789416Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.2789453Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2789577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2789659Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2789694Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2789859Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.2789909Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.2789948Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2790084Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.2790135Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.2790173Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2790334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.2790380Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.2790419Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2790584Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2790622Z t = dispatch_trace( 2025-09-07T07:34:42.2790655Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2790770Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2790811Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2790847Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2790975Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2791014Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2791960Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2792122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2792199Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2792245Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2792369Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2792407Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2792440Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2792567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2792607Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2792673Z ^^^^^^^^^ 2025-09-07T07:34:42.2792823Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2792872Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2792905Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2792945Z File "", line 1, in 2025-09-07T07:34:42.2793093Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.2793170Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.2793214Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2793349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.2793396Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.2793434Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2793627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2793669Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2794614Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2794788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.2794878Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.2794914Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2795058Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.2795099Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.2795135Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2795274Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.2795362Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.2795407Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2795533Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.2795597Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.2795640Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2795766Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.2795804Z leaves = list(leaves) 2025-09-07T07:34:42.2795837Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2795962Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.2795996Z return func(x) 2025-09-07T07:34:42.2796030Z ^^^^^^^ 2025-09-07T07:34:42.2796167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.2796232Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.2797298Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2797467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2797511Z return func(*args, **kwargs) 2025-09-07T07:34:42.2797548Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2797731Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.2797816Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.2797819Z 2025-09-07T07:34:42.2798075Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.2798078Z 2025-09-07T07:34:42.2798080Z 2025-09-07T07:34:42.2798152Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.2798347Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.2798351Z 2025-09-07T07:34:42.2798437Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.2798509Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2798544Z inline_call [] 2025-09-07T07:34:42.2798599Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.2798633Z inductor [] 2025-09-07T07:34:42.2798705Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2798777Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2799036Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2799146Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 801, in forward 2025-09-07T07:34:42.2799232Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, (ci, cj, a, b)) 2025-09-07T07:34:42.2799435Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2799520Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2800656Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2800778Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2800853Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2800887Z inline_call [] 2025-09-07T07:34:42.2800941Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.2801015Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2801084Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2801350Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2801458Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 801, in forward 2025-09-07T07:34:42.2801544Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, (ci, cj, a, b)) 2025-09-07T07:34:42.2801697Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2801781Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2801910Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2802029Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2802100Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2802135Z inline_call [] 2025-09-07T07:34:42.2802188Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.2802261Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2802329Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2802620Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2802729Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 801, in forward 2025-09-07T07:34:42.2802814Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, (ci, cj, a, b)) 2025-09-07T07:34:42.2803884Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2803971Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2804099Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2804216Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2804430Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-0f640ae1a7e40d6a.xml - 2025-09-07T07:34:42.2804490Z =========================== short test summary info ============================ 2025-09-07T07:34:42.2804855Z FAILED [1.4182s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.2804940Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.2804977Z 2025-09-07T07:34:42.2805184Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.2805187Z 2025-09-07T07:34:42.2805189Z 2025-09-07T07:34:42.2805260Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.2805457Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.2805459Z 2025-09-07T07:34:42.2805543Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.2805602Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.2805667Z ================== 1 failed, 245 deselected, 2 rerun in 4.23s ================== 2025-09-07T07:34:42.2805703Z Got exit code 1 2025-09-07T07:34:42.2805827Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-09-07T07:34:42.2806250Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.2806288Z import pkg_resources 2025-09-07T07:34:42.2806459Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-4fa8e2f1cfa57fe5.xml 2025-09-07T07:34:42.2806592Z ============================= test session starts ============================== 2025-09-07T07:34:42.2806706Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.2807675Z cachedir: .pytest_cache 2025-09-07T07:34:42.2807838Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.2807882Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.2807920Z configfile: pytest.ini 2025-09-07T07:34:42.2808081Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.2808157Z collecting ... collected 467 items / 34 deselected / 433 selected 2025-09-07T07:34:42.2808207Z stepcurrent: skipping 34 already run items. 2025-09-07T07:34:42.2808294Z Running 212 items in this shard 2025-09-07T07:34:42.2808296Z 2025-09-07T07:34:42.2808474Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_False_autograd_False PASSED [1.4935s] [ 0%] 2025-09-07T07:34:42.2808645Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_False PASSED [1.7148s] [ 0%] 2025-09-07T07:34:42.2808842Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.7301s] [ 1%] 2025-09-07T07:34:42.2809037Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.7388s] [ 1%] 2025-09-07T07:34:42.2809207Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True FAILED [0.7492s] [ 1%] 2025-09-07T07:34:42.2809211Z 2025-09-07T07:34:42.2809260Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.2809370Z _ WhileLoopTests.test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.2809413Z Traceback (most recent call last): 2025-09-07T07:34:42.2809568Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1230, in test_while_loop_simple_control_flow 2025-09-07T07:34:42.2809658Z self._run_test( 2025-09-07T07:34:42.2809770Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.2809824Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.2809865Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2810930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.2810978Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.2811019Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2811170Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.2811215Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.2811253Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2811389Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.2811438Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.2811475Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2811617Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.2811696Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.2811734Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2811888Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.2811934Z raise BackendCompilerFailed( 2025-09-07T07:34:42.2812084Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.2812137Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2812176Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2812323Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.2812373Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2812412Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2812528Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.2812593Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.2813592Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2813728Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.2813791Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.2813833Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2813972Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.2814018Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.2814053Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2814191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.2814229Z return aot_autograd( 2025-09-07T07:34:42.2814263Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.2814400Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.2814468Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.2814513Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2814673Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.2814791Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.2814835Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2815022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.2815064Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.2815252Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.2815292Z fx_g = _create_graph( 2025-09-07T07:34:42.2815328Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2815491Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.2816465Z fx_g = make_fx( 2025-09-07T07:34:42.2816557Z ^^^^^^^^ 2025-09-07T07:34:42.2816711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.2816763Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.2816800Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2816947Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.2816990Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.2817025Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2817187Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2817223Z t = dispatch_trace( 2025-09-07T07:34:42.2817257Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2817370Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2817412Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2817446Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2817576Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2817616Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2817651Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2817814Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2817893Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2817983Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2818110Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2819077Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2819112Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2819243Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2819284Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2819319Z ^^^^^^^^^ 2025-09-07T07:34:42.2819451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.2819491Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.2819525Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2819675Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2819726Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2819759Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2819916Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.2819978Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.2820022Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2820244Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2820282Z outs_pair = fn(*args) 2025-09-07T07:34:42.2820316Z ^^^^^^^^^ 2025-09-07T07:34:42.2820488Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.2820555Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.2820600Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2820772Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2820810Z outs_pair = fn(*args) 2025-09-07T07:34:42.2821755Z ^^^^^^^^^ 2025-09-07T07:34:42.2821936Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.2822000Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.2822041Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2822237Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.2822307Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.2822352Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2822525Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2822563Z outs_pair = fn(*args) 2025-09-07T07:34:42.2822596Z ^^^^^^^^^ 2025-09-07T07:34:42.2822784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2822832Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2822868Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2823037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.2823081Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.2823118Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2823280Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.2823323Z return handle_torch_function( 2025-09-07T07:34:42.2823358Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2823499Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.2823573Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.2824532Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2824699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2824740Z return func(*args, **kwargs) 2025-09-07T07:34:42.2824775Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2824899Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.2824940Z result = _engine_run_backward( 2025-09-07T07:34:42.2824978Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2825124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.2825244Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2825293Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2825457Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.2825498Z return user_fn(self, *args) 2025-09-07T07:34:42.2825534Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2825678Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.2825721Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.2825756Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2825914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.2825959Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.2825994Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2826118Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2826159Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2826195Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2827341Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.2827395Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.2827434Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2827571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.2827621Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.2827660Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2827826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.2827874Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.2827912Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2828076Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2828113Z t = dispatch_trace( 2025-09-07T07:34:42.2828147Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2828260Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2828303Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2828337Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2828507Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2828546Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2828579Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2828741Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2828818Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2828862Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2829907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2829946Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2829979Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2830107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2830147Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2830184Z ^^^^^^^^^ 2025-09-07T07:34:42.2830333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2830382Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2830415Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2830456Z File "", line 1, in 2025-09-07T07:34:42.2830600Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.2830725Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.2830769Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2830906Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.2830952Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.2830990Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2831182Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2831225Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2831260Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2831430Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.2831476Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.2831512Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2832572Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.2832615Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.2832649Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2832786Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.2832874Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.2832919Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2833044Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.2833104Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.2833151Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2833276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.2833315Z leaves = list(leaves) 2025-09-07T07:34:42.2833348Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2833472Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.2833506Z return func(x) 2025-09-07T07:34:42.2833539Z ^^^^^^^ 2025-09-07T07:34:42.2833721Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.2833786Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.2833828Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2833994Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2834037Z return func(*args, **kwargs) 2025-09-07T07:34:42.2834073Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2834259Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.2835261Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.2835263Z 2025-09-07T07:34:42.2835471Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.2835473Z 2025-09-07T07:34:42.2835475Z 2025-09-07T07:34:42.2835548Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.2835747Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.2835784Z 2025-09-07T07:34:42.2835869Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.2835943Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2835978Z inline_call [] 2025-09-07T07:34:42.2836032Z stats [('calls_captured', 8), ('unique_graphs', 1)] 2025-09-07T07:34:42.2836107Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2836177Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2836437Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2836621Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 781, in forward 2025-09-07T07:34:42.2836706Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, [ci, a, b]) 2025-09-07T07:34:42.2836863Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2836949Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2837081Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2837202Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2837313Z _ WhileLoopTests.test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.2837357Z Traceback (most recent call last): 2025-09-07T07:34:42.2837509Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1230, in test_while_loop_simple_control_flow 2025-09-07T07:34:42.2837544Z self._run_test( 2025-09-07T07:34:42.2838584Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.2838644Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.2838683Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2838816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.2838861Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.2838899Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2839093Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.2839140Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.2839178Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2839314Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.2839356Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.2839393Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2839538Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.2839618Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.2839656Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2839810Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.2839856Z raise BackendCompilerFailed( 2025-09-07T07:34:42.2840007Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.2840061Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2840100Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2840289Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.2840390Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2841362Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2841478Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.2841543Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.2841585Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2841715Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.2841777Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.2841818Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2841961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.2842004Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.2842045Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2842183Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.2842221Z return aot_autograd( 2025-09-07T07:34:42.2842256Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.2842392Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.2842461Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.2842507Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2842674Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.2842758Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.2842803Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2842989Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.2843031Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.2844125Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.2844168Z fx_g = _create_graph( 2025-09-07T07:34:42.2844202Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2844399Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.2844433Z fx_g = make_fx( 2025-09-07T07:34:42.2844465Z ^^^^^^^^ 2025-09-07T07:34:42.2844615Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.2844660Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.2844699Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2844844Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.2844887Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.2844922Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2845080Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2845116Z t = dispatch_trace( 2025-09-07T07:34:42.2845150Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2845263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2845304Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2845338Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2845463Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2845539Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2845574Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2845743Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2846807Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2846848Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2846973Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2847014Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2847049Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2847175Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2847215Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2847248Z ^^^^^^^^^ 2025-09-07T07:34:42.2847379Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.2847424Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.2847458Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2847607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2847656Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2847690Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2847847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.2847909Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.2847952Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2848126Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2848167Z outs_pair = fn(*args) 2025-09-07T07:34:42.2848201Z ^^^^^^^^^ 2025-09-07T07:34:42.2848372Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.2849356Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.2849400Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2849638Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2849676Z outs_pair = fn(*args) 2025-09-07T07:34:42.2849710Z ^^^^^^^^^ 2025-09-07T07:34:42.2849886Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.2849945Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.2849986Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2850181Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.2850250Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.2850296Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2850467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2850507Z outs_pair = fn(*args) 2025-09-07T07:34:42.2850540Z ^^^^^^^^^ 2025-09-07T07:34:42.2850729Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2850773Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2850809Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2851026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.2851071Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.2851107Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2851232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.2851274Z return handle_torch_function( 2025-09-07T07:34:42.2852229Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2852373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.2852450Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.2852497Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2852664Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2852711Z return func(*args, **kwargs) 2025-09-07T07:34:42.2852746Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2852870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.2852911Z result = _engine_run_backward( 2025-09-07T07:34:42.2852946Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2853091Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.2853215Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2853264Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2853391Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.2853436Z return user_fn(self, *args) 2025-09-07T07:34:42.2853473Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2853617Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.2853660Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.2853695Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2853853Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.2853928Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.2854874Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2854999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2855038Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2855072Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2855237Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.2855290Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.2855329Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2855468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.2855519Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.2855557Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2855721Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.2855769Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.2855807Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2855967Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2856042Z t = dispatch_trace( 2025-09-07T07:34:42.2856076Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2856188Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2856231Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2856266Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2856390Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2856427Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2856463Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2857609Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2857689Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2857728Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2857851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2857894Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2857928Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2858053Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2858094Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2858127Z ^^^^^^^^^ 2025-09-07T07:34:42.2858279Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2858328Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2858362Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2858403Z File "", line 1, in 2025-09-07T07:34:42.2858545Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.2858622Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.2858669Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2858806Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.2858852Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.2858890Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2859080Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2859169Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2859204Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2860292Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.2860336Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.2860373Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2860518Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.2860560Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.2860594Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2860728Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.2860815Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.2860863Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2860988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.2861048Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.2861090Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2861220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.2861301Z leaves = list(leaves) 2025-09-07T07:34:42.2861335Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2861457Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.2861493Z return func(x) 2025-09-07T07:34:42.2861524Z ^^^^^^^ 2025-09-07T07:34:42.2861662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.2861728Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.2861769Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2862852Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2862894Z return func(*args, **kwargs) 2025-09-07T07:34:42.2862929Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2863115Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.2863200Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.2863202Z 2025-09-07T07:34:42.2863408Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.2863412Z 2025-09-07T07:34:42.2863414Z 2025-09-07T07:34:42.2863488Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.2863686Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.2863689Z 2025-09-07T07:34:42.2863772Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.2863846Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2863881Z inline_call [] 2025-09-07T07:34:42.2863936Z stats [('calls_captured', 8), ('unique_graphs', 1)] 2025-09-07T07:34:42.2864009Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2864081Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2864373Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2864486Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 781, in forward 2025-09-07T07:34:42.2864569Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, [ci, a, b]) 2025-09-07T07:34:42.2864723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2864811Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2864942Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2865062Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2866051Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2866085Z inline_call [] 2025-09-07T07:34:42.2866138Z stats [('calls_captured', 8), ('unique_graphs', 1)] 2025-09-07T07:34:42.2866213Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2866282Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2866606Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2866764Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 781, in forward 2025-09-07T07:34:42.2866846Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, [ci, a, b]) 2025-09-07T07:34:42.2866997Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2867080Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2867212Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2867333Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2867382Z =================================== FAILURES =================================== 2025-09-07T07:34:42.2867493Z _ WhileLoopTests.test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.2867536Z Traceback (most recent call last): 2025-09-07T07:34:42.2867693Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1230, in test_while_loop_simple_control_flow 2025-09-07T07:34:42.2867727Z self._run_test( 2025-09-07T07:34:42.2867840Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.2867893Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.2867933Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2868066Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.2868112Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.2869081Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2869234Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.2869279Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.2869323Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2869458Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.2869501Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.2869537Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2869679Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.2869758Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.2869837Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2869989Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.2870035Z raise BackendCompilerFailed( 2025-09-07T07:34:42.2870185Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.2870241Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2870281Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2870423Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.2870474Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2870512Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2870630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.2870695Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.2870739Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2870865Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.2871841Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.2871925Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2872066Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.2872109Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.2872146Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2872283Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.2872322Z return aot_autograd( 2025-09-07T07:34:42.2872359Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.2872495Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.2872563Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.2872607Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2872766Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.2872854Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.2872898Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2873085Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.2873127Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.2873317Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.2873355Z fx_g = _create_graph( 2025-09-07T07:34:42.2873390Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2873553Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.2873590Z fx_g = make_fx( 2025-09-07T07:34:42.2874539Z ^^^^^^^^ 2025-09-07T07:34:42.2874692Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.2874736Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.2874773Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2874919Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.2874961Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.2875025Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2875184Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2875221Z t = dispatch_trace( 2025-09-07T07:34:42.2875254Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2875366Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2875411Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2875446Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2875571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2875611Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2875645Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2875807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2875886Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2875926Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2876052Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2876090Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2876124Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2877277Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2877319Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2877354Z ^^^^^^^^^ 2025-09-07T07:34:42.2877488Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.2877528Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.2877561Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2877714Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2877763Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2877796Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2877952Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.2878013Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.2878061Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2878236Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2878275Z outs_pair = fn(*args) 2025-09-07T07:34:42.2878309Z ^^^^^^^^^ 2025-09-07T07:34:42.2878480Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.2878549Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.2878593Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2878766Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2878804Z outs_pair = fn(*args) 2025-09-07T07:34:42.2878837Z ^^^^^^^^^ 2025-09-07T07:34:42.2879939Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.2879999Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.2880041Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2880317Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.2880433Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.2880478Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2880652Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2880690Z outs_pair = fn(*args) 2025-09-07T07:34:42.2880723Z ^^^^^^^^^ 2025-09-07T07:34:42.2880914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2880959Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2880994Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2881163Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.2881208Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.2881245Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2881370Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.2881412Z return handle_torch_function( 2025-09-07T07:34:42.2881447Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2881587Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.2881703Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.2881747Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2881914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2882882Z return func(*args, **kwargs) 2025-09-07T07:34:42.2882918Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2883048Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.2883090Z result = _engine_run_backward( 2025-09-07T07:34:42.2883124Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2883270Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.2883391Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2883446Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2883572Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.2883613Z return user_fn(self, *args) 2025-09-07T07:34:42.2883648Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2883792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.2883837Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.2883873Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2884030Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.2884073Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.2884108Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2884235Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2884273Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2884307Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2884471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.2885429Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.2885469Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2885641Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.2885690Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.2885728Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2885890Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.2885939Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.2885977Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2886135Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2886172Z t = dispatch_trace( 2025-09-07T07:34:42.2886205Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2886318Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2886359Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2886397Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2886579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2886618Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2886651Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2886813Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2886933Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2886974Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2887097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2887136Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2888125Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2888256Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2888297Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2888332Z ^^^^^^^^^ 2025-09-07T07:34:42.2888481Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2888529Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2888562Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2888609Z File "", line 1, in 2025-09-07T07:34:42.2888756Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.2888835Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.2888880Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2889016Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.2889064Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.2889102Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2889294Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2889336Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2889372Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2889545Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.2889589Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.2889625Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2889772Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.2889813Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.2890813Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2890949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.2891037Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.2891081Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2891206Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.2891266Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.2891309Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2891434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.2891472Z leaves = list(leaves) 2025-09-07T07:34:42.2891505Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2891633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.2891667Z return func(x) 2025-09-07T07:34:42.2891700Z ^^^^^^^ 2025-09-07T07:34:42.2891836Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.2891900Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.2891941Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2892145Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2892185Z return func(*args, **kwargs) 2025-09-07T07:34:42.2892220Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2892401Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.2892485Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.2892487Z 2025-09-07T07:34:42.2893617Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.2893620Z 2025-09-07T07:34:42.2893622Z 2025-09-07T07:34:42.2893696Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.2893895Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.2893902Z 2025-09-07T07:34:42.2893987Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.2894060Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2894094Z inline_call [] 2025-09-07T07:34:42.2894147Z stats [('calls_captured', 8), ('unique_graphs', 1)] 2025-09-07T07:34:42.2894221Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2894296Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2894556Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2894668Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 781, in forward 2025-09-07T07:34:42.2894755Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, [ci, a, b]) 2025-09-07T07:34:42.2894907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2894992Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2895123Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2895280Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2895351Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2895385Z inline_call [] 2025-09-07T07:34:42.2895438Z stats [('calls_captured', 8), ('unique_graphs', 1)] 2025-09-07T07:34:42.2895509Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2895582Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2895834Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2896962Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 781, in forward 2025-09-07T07:34:42.2897043Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, [ci, a, b]) 2025-09-07T07:34:42.2897197Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2897280Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2897410Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2897529Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2897648Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2897681Z inline_call [] 2025-09-07T07:34:42.2897734Z stats [('calls_captured', 8), ('unique_graphs', 1)] 2025-09-07T07:34:42.2897805Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2897876Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2898133Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2898241Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 781, in forward 2025-09-07T07:34:42.2898320Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, [ci, a, b]) 2025-09-07T07:34:42.2898469Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2898555Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2898685Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2898802Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2899022Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-4fa8e2f1cfa57fe5.xml - 2025-09-07T07:34:42.2899079Z =========================== short test summary info ============================ 2025-09-07T07:34:42.2899440Z FAILED [0.7492s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.2900452Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.2900456Z 2025-09-07T07:34:42.2900663Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.2900665Z 2025-09-07T07:34:42.2900667Z 2025-09-07T07:34:42.2900738Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.2900992Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.2900994Z 2025-09-07T07:34:42.2901081Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.2901140Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.2901210Z ============= 1 failed, 2 passed, 34 deselected, 2 rerun in 5.61s ============== 2025-09-07T07:34:42.2901249Z Got exit code 1 2025-09-07T07:34:42.2901287Z Retrying single test... 2025-09-07T07:34:42.2901714Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.2901752Z import pkg_resources 2025-09-07T07:34:42.2901922Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-4eb4c694c3cab272.xml 2025-09-07T07:34:42.2901978Z ============================= test session starts ============================== 2025-09-07T07:34:42.2902090Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.2902129Z cachedir: .pytest_cache 2025-09-07T07:34:42.2902286Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.2902366Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.2902404Z configfile: pytest.ini 2025-09-07T07:34:42.2902564Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.2902640Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.2902875Z stepcurrent: skipping 36 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.2903841Z Running 1 items in this shard 2025-09-07T07:34:42.2903843Z 2025-09-07T07:34:42.2904042Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.8994s] [100%] 2025-09-07T07:34:42.2904240Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.6852s] [100%] 2025-09-07T07:34:42.2904411Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True FAILED [0.6858s] [100%] 2025-09-07T07:34:42.2904413Z 2025-09-07T07:34:42.2904460Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.2904572Z _ WhileLoopTests.test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.2904618Z Traceback (most recent call last): 2025-09-07T07:34:42.2904773Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1230, in test_while_loop_simple_control_flow 2025-09-07T07:34:42.2904807Z self._run_test( 2025-09-07T07:34:42.2904921Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.2904975Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.2905020Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2905156Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.2905201Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.2905240Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2905391Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.2905474Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.2905513Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2905650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.2905693Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.2905730Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2905875Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.2906937Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.2906976Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2907129Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.2907174Z raise BackendCompilerFailed( 2025-09-07T07:34:42.2907327Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.2907379Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2907419Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2907560Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.2907611Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2907699Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2907815Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.2907880Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.2907924Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2908050Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.2908115Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.2908155Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2908296Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.2908340Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.2908377Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2908522Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.2908561Z return aot_autograd( 2025-09-07T07:34:42.2909525Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.2909663Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.2909732Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.2909776Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2909938Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.2910021Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.2910066Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2910249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.2910296Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.2910481Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.2910520Z fx_g = _create_graph( 2025-09-07T07:34:42.2910554Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2910762Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.2910796Z fx_g = make_fx( 2025-09-07T07:34:42.2910829Z ^^^^^^^^ 2025-09-07T07:34:42.2910981Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.2911027Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.2911063Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2911213Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.2911254Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.2911290Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2911449Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2912410Z t = dispatch_trace( 2025-09-07T07:34:42.2912444Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2912561Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2912602Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2912637Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2912761Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2912801Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2912871Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2913031Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2913111Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2913150Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2913276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2913313Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2913349Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2913475Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2913516Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2913549Z ^^^^^^^^^ 2025-09-07T07:34:42.2913682Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.2913724Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.2913759Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2913907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2914871Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2914904Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2915062Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.2915125Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.2915169Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2915347Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2915386Z outs_pair = fn(*args) 2025-09-07T07:34:42.2915424Z ^^^^^^^^^ 2025-09-07T07:34:42.2915595Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.2915661Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.2915705Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2915876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2915946Z outs_pair = fn(*args) 2025-09-07T07:34:42.2915980Z ^^^^^^^^^ 2025-09-07T07:34:42.2916157Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.2916216Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.2916257Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2916454Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.2916587Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.2916633Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2916805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2916843Z outs_pair = fn(*args) 2025-09-07T07:34:42.2917807Z ^^^^^^^^^ 2025-09-07T07:34:42.2917998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2918042Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2918078Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2918246Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.2918337Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.2918373Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2918500Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.2918542Z return handle_torch_function( 2025-09-07T07:34:42.2918578Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2918722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.2918796Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.2918841Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2919007Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2919052Z return func(*args, **kwargs) 2025-09-07T07:34:42.2919088Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2919210Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.2919251Z result = _engine_run_backward( 2025-09-07T07:34:42.2919286Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2919432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.2919554Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2920579Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2920707Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.2920748Z return user_fn(self, *args) 2025-09-07T07:34:42.2920789Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2920936Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.2920978Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.2921014Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2921171Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.2921215Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.2921294Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2921418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2921457Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2921491Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2921657Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.2921711Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.2921751Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2921886Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.2921936Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.2921973Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2922136Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.2922182Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.2922222Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2923307Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2923346Z t = dispatch_trace( 2025-09-07T07:34:42.2923416Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2923531Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2923572Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2923608Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2923731Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2923770Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2923803Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2923967Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2924045Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2924086Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2924209Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2924249Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2924283Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2924408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2924449Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2924482Z ^^^^^^^^^ 2025-09-07T07:34:42.2924631Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2924683Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2924717Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2924758Z File "", line 1, in 2025-09-07T07:34:42.2925819Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.2925897Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.2925942Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2926082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.2926129Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.2926166Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2926357Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2926399Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2926466Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2926698Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.2926742Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.2926778Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2926923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.2926967Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.2927002Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2927136Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.2927224Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.2927269Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2927395Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.2927456Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.2927498Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2928630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.2928724Z leaves = list(leaves) 2025-09-07T07:34:42.2928758Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2928881Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.2928916Z return func(x) 2025-09-07T07:34:42.2928948Z ^^^^^^^ 2025-09-07T07:34:42.2929086Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.2929150Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.2929194Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2929360Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2929400Z return func(*args, **kwargs) 2025-09-07T07:34:42.2929435Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2929615Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.2929705Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.2929707Z 2025-09-07T07:34:42.2929917Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.2929919Z 2025-09-07T07:34:42.2929921Z 2025-09-07T07:34:42.2929993Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.2930195Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.2930197Z 2025-09-07T07:34:42.2930282Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.2930357Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2930394Z inline_call [] 2025-09-07T07:34:42.2930449Z stats [('calls_captured', 8), ('unique_graphs', 1)] 2025-09-07T07:34:42.2930482Z inductor [] 2025-09-07T07:34:42.2931483Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2931555Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2931813Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2931963Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 781, in forward 2025-09-07T07:34:42.2932046Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, [ci, a, b]) 2025-09-07T07:34:42.2932197Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2932281Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2932418Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2932538Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2932647Z _ WhileLoopTests.test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.2932691Z Traceback (most recent call last): 2025-09-07T07:34:42.2932847Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1230, in test_while_loop_simple_control_flow 2025-09-07T07:34:42.2932882Z self._run_test( 2025-09-07T07:34:42.2932995Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.2933049Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.2933089Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2933262Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.2933308Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.2933347Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2933495Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.2933541Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.2934500Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2934643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.2934687Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.2934724Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2934865Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.2934952Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.2934990Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2935140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.2935185Z raise BackendCompilerFailed( 2025-09-07T07:34:42.2935334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.2935387Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2935428Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2935569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.2935619Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2935658Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2935774Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.2935843Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.2935886Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2936012Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.2936074Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.2936115Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2936285Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.2937326Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.2937364Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2937503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.2937545Z return aot_autograd( 2025-09-07T07:34:42.2937580Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.2937716Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.2937784Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.2937830Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2937990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.2938075Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.2938120Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2938302Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.2938392Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.2938577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.2938617Z fx_g = _create_graph( 2025-09-07T07:34:42.2938651Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2938816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.2938851Z fx_g = make_fx( 2025-09-07T07:34:42.2938882Z ^^^^^^^^ 2025-09-07T07:34:42.2939036Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.2939080Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.2940046Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2940194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.2940241Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.2940276Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2940434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2940471Z t = dispatch_trace( 2025-09-07T07:34:42.2940504Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2940616Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2940657Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2940693Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2940818Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2940856Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2940891Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2941051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2941132Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2941172Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2941296Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2941333Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2941367Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2941534Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2941575Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2941609Z ^^^^^^^^^ 2025-09-07T07:34:42.2942660Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.2942699Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.2942733Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2942884Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2942933Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2942966Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2943122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.2943184Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.2943230Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2943405Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2943443Z outs_pair = fn(*args) 2025-09-07T07:34:42.2943477Z ^^^^^^^^^ 2025-09-07T07:34:42.2943648Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.2943750Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.2943793Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2943966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2944003Z outs_pair = fn(*args) 2025-09-07T07:34:42.2944038Z ^^^^^^^^^ 2025-09-07T07:34:42.2944217Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.2944276Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.2944317Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2945421Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.2945498Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.2945543Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2945718Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2945756Z outs_pair = fn(*args) 2025-09-07T07:34:42.2945789Z ^^^^^^^^^ 2025-09-07T07:34:42.2945980Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2946025Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2946061Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2946229Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.2946277Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.2946313Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2946437Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.2946480Z return handle_torch_function( 2025-09-07T07:34:42.2946576Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2946718Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.2946832Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.2946878Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2947044Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2947085Z return func(*args, **kwargs) 2025-09-07T07:34:42.2947119Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2947244Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.2948210Z result = _engine_run_backward( 2025-09-07T07:34:42.2948247Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2948392Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.2948517Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2948568Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2948695Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.2948736Z return user_fn(self, *args) 2025-09-07T07:34:42.2948772Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2948916Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.2949008Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.2949043Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2949201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.2949246Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.2949283Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2949407Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2949446Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2949480Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2949643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.2949695Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.2949733Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2949872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.2950839Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.2950878Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2951039Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.2951086Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.2951126Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2951286Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2951323Z t = dispatch_trace( 2025-09-07T07:34:42.2951357Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2951469Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2951517Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2951552Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2951676Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2951713Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2951747Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2951907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2952019Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2952059Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2952184Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2952221Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2952255Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2952382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2952423Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2953376Z ^^^^^^^^^ 2025-09-07T07:34:42.2953527Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2953575Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2953608Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2953649Z File "", line 1, in 2025-09-07T07:34:42.2953793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.2953871Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.2953915Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2954051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.2954139Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.2954177Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2954374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2954417Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2954452Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2954625Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.2954668Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.2954705Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2954847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.2954889Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.2954926Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2955059Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.2955146Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.2956108Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2956234Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.2956296Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.2956338Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2956464Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.2956560Z leaves = list(leaves) 2025-09-07T07:34:42.2956595Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2956723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.2956757Z return func(x) 2025-09-07T07:34:42.2956788Z ^^^^^^^ 2025-09-07T07:34:42.2956924Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.2956989Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.2957030Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2957242Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2957282Z return func(*args, **kwargs) 2025-09-07T07:34:42.2957318Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2957499Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.2957583Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.2957590Z 2025-09-07T07:34:42.2957795Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.2957798Z 2025-09-07T07:34:42.2957801Z 2025-09-07T07:34:42.2957872Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.2958070Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.2958073Z 2025-09-07T07:34:42.2959080Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.2959155Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2959188Z inline_call [] 2025-09-07T07:34:42.2959242Z stats [('calls_captured', 8), ('unique_graphs', 1)] 2025-09-07T07:34:42.2959319Z inductor [] 2025-09-07T07:34:42.2959392Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2959464Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2959721Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2959832Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 781, in forward 2025-09-07T07:34:42.2959917Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, [ci, a, b]) 2025-09-07T07:34:42.2960069Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2960183Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2960320Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2960445Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2960516Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2960549Z inline_call [] 2025-09-07T07:34:42.2960603Z stats [('calls_captured', 8), ('unique_graphs', 1)] 2025-09-07T07:34:42.2960675Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2960746Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2961001Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2961110Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 781, in forward 2025-09-07T07:34:42.2961191Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, [ci, a, b]) 2025-09-07T07:34:42.2962269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2962353Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2962483Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2962600Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2962700Z =================================== FAILURES =================================== 2025-09-07T07:34:42.2962810Z _ WhileLoopTests.test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.2962854Z Traceback (most recent call last): 2025-09-07T07:34:42.2963005Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1230, in test_while_loop_simple_control_flow 2025-09-07T07:34:42.2963044Z self._run_test( 2025-09-07T07:34:42.2963154Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.2963209Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.2963248Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2963384Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.2963428Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.2963469Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2963619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.2963665Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.2963703Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2963840Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.2963915Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.2963950Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2965015Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.2965096Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.2965134Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2965287Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.2965332Z raise BackendCompilerFailed( 2025-09-07T07:34:42.2965482Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.2965533Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2965572Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2965724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.2965774Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.2965812Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2965928Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.2965993Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.2966038Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2966169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.2966231Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.2966273Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2966413Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.2966460Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.2966639Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2966776Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.2966814Z return aot_autograd( 2025-09-07T07:34:42.2967770Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.2967961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.2968031Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.2968076Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2968239Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.2968321Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.2968369Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2968550Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.2968593Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.2968777Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.2968818Z fx_g = _create_graph( 2025-09-07T07:34:42.2968852Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2969015Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.2969049Z fx_g = make_fx( 2025-09-07T07:34:42.2969080Z ^^^^^^^^ 2025-09-07T07:34:42.2969235Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.2969330Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.2969369Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2969518Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.2969560Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.2969595Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2975227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2975278Z t = dispatch_trace( 2025-09-07T07:34:42.2975312Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2975428Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2975469Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2975504Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2975637Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2975675Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2975711Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2975872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2975952Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2975991Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2976117Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2976154Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2976188Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2976314Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2976361Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2976394Z ^^^^^^^^^ 2025-09-07T07:34:42.2976583Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.2976623Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.2976657Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2976809Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2977969Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2978065Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2978229Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.2978292Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.2978335Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2978514Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2978554Z outs_pair = fn(*args) 2025-09-07T07:34:42.2978589Z ^^^^^^^^^ 2025-09-07T07:34:42.2978762Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.2978830Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.2978873Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2979048Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2979086Z outs_pair = fn(*args) 2025-09-07T07:34:42.2979120Z ^^^^^^^^^ 2025-09-07T07:34:42.2979297Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.2979400Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.2979441Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2979637Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.2979707Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.2979753Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2979928Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.2980899Z outs_pair = fn(*args) 2025-09-07T07:34:42.2980934Z ^^^^^^^^^ 2025-09-07T07:34:42.2981127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2981176Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2981212Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2981379Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.2981424Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.2981459Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2981589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.2981630Z return handle_torch_function( 2025-09-07T07:34:42.2981665Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2981807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.2981881Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.2981926Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2982095Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2982136Z return func(*args, **kwargs) 2025-09-07T07:34:42.2982170Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2982294Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.2982336Z result = _engine_run_backward( 2025-09-07T07:34:42.2982406Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2982553Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.2982673Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2983638Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2983770Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.2983810Z return user_fn(self, *args) 2025-09-07T07:34:42.2983846Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2983989Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.2984032Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.2984067Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2984227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.2984269Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.2984305Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2984427Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2984466Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2984536Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2984701Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.2984751Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.2984790Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2984926Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.2984975Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.2985014Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2985174Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.2985222Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.2986171Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2986337Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.2986374Z t = dispatch_trace( 2025-09-07T07:34:42.2986408Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2986596Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.2986639Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.2986674Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2986800Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2986837Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2986872Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2987031Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.2987109Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.2988661Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2988788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.2988825Z return fn(*args, **kwargs) 2025-09-07T07:34:42.2988859Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2988985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.2989025Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.2989059Z ^^^^^^^^^ 2025-09-07T07:34:42.2989254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.2989305Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.2990642Z ^^^^^^^^^^^ 2025-09-07T07:34:42.2990689Z File "", line 1, in 2025-09-07T07:34:42.2990834Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.2990930Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.2990975Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2991111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.2991159Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.2991196Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2991391Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.2991433Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.2991469Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2991640Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.2991722Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.2991757Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2991901Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.2991943Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.2991978Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2992112Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.2992202Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.2992249Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2992375Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.2993418Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.2993466Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2993594Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.2993631Z leaves = list(leaves) 2025-09-07T07:34:42.2993665Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.2993788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.2993821Z return func(x) 2025-09-07T07:34:42.2993855Z ^^^^^^^ 2025-09-07T07:34:42.2993996Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.2994062Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.2994102Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2994270Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.2994390Z return func(*args, **kwargs) 2025-09-07T07:34:42.2994425Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.2994607Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.2994692Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.2994695Z 2025-09-07T07:34:42.2994941Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.2994944Z 2025-09-07T07:34:42.2994946Z 2025-09-07T07:34:42.2995020Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.2995218Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.2995222Z 2025-09-07T07:34:42.2995307Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.2995382Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2996377Z inline_call [] 2025-09-07T07:34:42.2996433Z stats [('calls_captured', 8), ('unique_graphs', 1)] 2025-09-07T07:34:42.2996468Z inductor [] 2025-09-07T07:34:42.2996609Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2996684Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2996945Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2997059Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 781, in forward 2025-09-07T07:34:42.2997140Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, [ci, a, b]) 2025-09-07T07:34:42.2997321Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2997405Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2997537Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2997657Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2997729Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2997763Z inline_call [] 2025-09-07T07:34:42.2997817Z stats [('calls_captured', 8), ('unique_graphs', 1)] 2025-09-07T07:34:42.2997888Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.2997958Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.2998213Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.2998324Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 781, in forward 2025-09-07T07:34:42.2998404Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, [ci, a, b]) 2025-09-07T07:34:42.2998554Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.2998637Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.2999727Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.2999846Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.2999916Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.2999982Z inline_call [] 2025-09-07T07:34:42.3000034Z stats [('calls_captured', 8), ('unique_graphs', 1)] 2025-09-07T07:34:42.3000105Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3000235Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3000488Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3000635Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 781, in forward 2025-09-07T07:34:42.3000715Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, [ci, a, b]) 2025-09-07T07:34:42.3000863Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3000945Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3001076Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3001193Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3001410Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-4eb4c694c3cab272.xml - 2025-09-07T07:34:42.3001469Z =========================== short test summary info ============================ 2025-09-07T07:34:42.3001836Z FAILED [0.6858s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3001920Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3001939Z 2025-09-07T07:34:42.3002148Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3002150Z 2025-09-07T07:34:42.3002152Z 2025-09-07T07:34:42.3002223Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3002419Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.3002422Z 2025-09-07T07:34:42.3003473Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3003534Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.3003600Z ================== 1 failed, 245 deselected, 2 rerun in 2.71s ================== 2025-09-07T07:34:42.3003634Z Got exit code 1 2025-09-07T07:34:42.3003672Z Retrying single test... 2025-09-07T07:34:42.3004101Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.3004140Z import pkg_resources 2025-09-07T07:34:42.3004311Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-1c6ac9beff6458c9.xml 2025-09-07T07:34:42.3004368Z ============================= test session starts ============================== 2025-09-07T07:34:42.3004482Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.3004520Z cachedir: .pytest_cache 2025-09-07T07:34:42.3004677Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.3004722Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.3004782Z configfile: pytest.ini 2025-09-07T07:34:42.3004942Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.3005018Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.3005248Z stepcurrent: skipping 36 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.3005320Z Running 1 items in this shard 2025-09-07T07:34:42.3005323Z 2025-09-07T07:34:42.3005519Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.9812s] [100%] 2025-09-07T07:34:42.3005713Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.6818s] [100%] 2025-09-07T07:34:42.3005883Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True FAILED [0.7726s] [100%] 2025-09-07T07:34:42.3005886Z 2025-09-07T07:34:42.3006944Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.3007057Z _ WhileLoopTests.test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.3007100Z Traceback (most recent call last): 2025-09-07T07:34:42.3007259Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1230, in test_while_loop_simple_control_flow 2025-09-07T07:34:42.3007293Z self._run_test( 2025-09-07T07:34:42.3007406Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3007461Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3007501Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3007663Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3007711Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3007749Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3007900Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3007946Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3007984Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3008122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3008165Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3008202Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3008343Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3008425Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3008464Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3008616Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3008662Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3009761Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3009817Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3009857Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3009999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3010049Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3010088Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3010210Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3010302Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3010344Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3010472Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3010534Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3010612Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3010754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3010799Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3010835Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3010974Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3011014Z return aot_autograd( 2025-09-07T07:34:42.3011049Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3011184Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3011253Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3011298Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3012399Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3012483Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3012527Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3012709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3012775Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3012963Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3013003Z fx_g = _create_graph( 2025-09-07T07:34:42.3013036Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3013200Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3013234Z fx_g = make_fx( 2025-09-07T07:34:42.3013267Z ^^^^^^^^ 2025-09-07T07:34:42.3013419Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3013464Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3013501Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3013646Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3013694Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3013729Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3013889Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3013926Z t = dispatch_trace( 2025-09-07T07:34:42.3013959Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3014071Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3014114Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3015077Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3015204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3015243Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3015279Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3015439Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3015539Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3015580Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3015705Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3015743Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3015778Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3015939Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3015981Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3016015Z ^^^^^^^^^ 2025-09-07T07:34:42.3016149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3016188Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3016225Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3016376Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3016425Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3016459Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3016686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3016749Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3016794Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3017918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3017957Z outs_pair = fn(*args) 2025-09-07T07:34:42.3017991Z ^^^^^^^^^ 2025-09-07T07:34:42.3018163Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3018259Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3018303Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3018476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3018514Z outs_pair = fn(*args) 2025-09-07T07:34:42.3018548Z ^^^^^^^^^ 2025-09-07T07:34:42.3018727Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3018786Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3018828Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3019024Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3019098Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3019145Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3019317Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3019355Z outs_pair = fn(*args) 2025-09-07T07:34:42.3019388Z ^^^^^^^^^ 2025-09-07T07:34:42.3019581Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3019626Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3019662Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3020762Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3020812Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3020870Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3020997Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3021038Z return handle_torch_function( 2025-09-07T07:34:42.3021074Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3021215Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3021326Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3021371Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3021542Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3021583Z return func(*args, **kwargs) 2025-09-07T07:34:42.3021617Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3021745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3021785Z result = _engine_run_backward( 2025-09-07T07:34:42.3021821Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3021966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3022087Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3022137Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3022264Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3022304Z return user_fn(self, *args) 2025-09-07T07:34:42.3022340Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3022483Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3023476Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3023512Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3023670Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3023713Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3023749Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3023874Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3023913Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3023946Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3024111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3024161Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3024204Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3024344Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3024393Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3024431Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3024592Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3024639Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3024680Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3024841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3024877Z t = dispatch_trace( 2025-09-07T07:34:42.3024911Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3025024Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3026022Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3026057Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3026182Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3026219Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3026254Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3026412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3026609Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3026649Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3026774Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3026810Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3026844Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3026970Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3027012Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3027046Z ^^^^^^^^^ 2025-09-07T07:34:42.3027196Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3027244Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3027278Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3027321Z File "", line 1, in 2025-09-07T07:34:42.3027465Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3027544Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3027587Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3027723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3028744Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3028781Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3028973Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3029016Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3029051Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3029224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3029267Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3029304Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3029447Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3029492Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3029527Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3029661Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3029748Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3029794Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3029921Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3029981Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3030023Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3030149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3030186Z leaves = list(leaves) 2025-09-07T07:34:42.3030220Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3030383Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3031355Z return func(x) 2025-09-07T07:34:42.3031387Z ^^^^^^^ 2025-09-07T07:34:42.3031526Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3031590Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3031631Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3031830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3031871Z return func(*args, **kwargs) 2025-09-07T07:34:42.3031906Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3032090Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3032179Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3032183Z 2025-09-07T07:34:42.3032390Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3032392Z 2025-09-07T07:34:42.3032394Z 2025-09-07T07:34:42.3032465Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3032666Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.3032668Z 2025-09-07T07:34:42.3032752Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3032827Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3032861Z inline_call [] 2025-09-07T07:34:42.3032915Z stats [('calls_captured', 8), ('unique_graphs', 1)] 2025-09-07T07:34:42.3032966Z inductor [] 2025-09-07T07:34:42.3033039Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3033111Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3033370Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3033485Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 781, in forward 2025-09-07T07:34:42.3034505Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, [ci, a, b]) 2025-09-07T07:34:42.3034658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3034743Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3034873Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3034998Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3035109Z _ WhileLoopTests.test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.3035152Z Traceback (most recent call last): 2025-09-07T07:34:42.3035304Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1230, in test_while_loop_simple_control_flow 2025-09-07T07:34:42.3035338Z self._run_test( 2025-09-07T07:34:42.3035452Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3035506Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3035546Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3035677Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3035726Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3035788Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3035939Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3035984Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3036022Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3036158Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3036234Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3036270Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3037438Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3037520Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3037559Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3037712Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3037757Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3037907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3037959Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3037998Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3038143Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3038192Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3038231Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3038346Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3038442Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3038487Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3038614Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3038676Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3038716Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3038859Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3038902Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3038939Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3039075Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3039114Z return aot_autograd( 2025-09-07T07:34:42.3040084Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3040276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3040346Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3040393Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3040553Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3040636Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3040682Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3040865Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3040907Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3041093Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3041159Z fx_g = _create_graph( 2025-09-07T07:34:42.3041194Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3041358Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3041392Z fx_g = make_fx( 2025-09-07T07:34:42.3041423Z ^^^^^^^^ 2025-09-07T07:34:42.3041609Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3041655Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3041692Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3041839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3041882Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3041917Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3043033Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3043071Z t = dispatch_trace( 2025-09-07T07:34:42.3043104Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3043218Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3043259Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3043293Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3043422Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3043462Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3043496Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3043659Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3043736Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3043797Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3043921Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3043960Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3043994Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3044120Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3044160Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3044197Z ^^^^^^^^^ 2025-09-07T07:34:42.3044329Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3044369Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3044403Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3044552Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3045538Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3045572Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3045729Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3045792Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3045835Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3046012Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3046050Z outs_pair = fn(*args) 2025-09-07T07:34:42.3046085Z ^^^^^^^^^ 2025-09-07T07:34:42.3046261Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3046326Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3046390Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3046631Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3046670Z outs_pair = fn(*args) 2025-09-07T07:34:42.3046703Z ^^^^^^^^^ 2025-09-07T07:34:42.3046881Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3046980Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3047022Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3047217Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3047287Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3047335Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3047508Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3047545Z outs_pair = fn(*args) 2025-09-07T07:34:42.3048639Z ^^^^^^^^^ 2025-09-07T07:34:42.3048830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3048878Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3048913Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3049082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3049126Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3049163Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3049315Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3049359Z return handle_torch_function( 2025-09-07T07:34:42.3049394Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3049538Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3049613Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3049658Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3049878Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3049919Z return func(*args, **kwargs) 2025-09-07T07:34:42.3049954Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3050079Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3050120Z result = _engine_run_backward( 2025-09-07T07:34:42.3050157Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3050303Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3050422Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3051424Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3051555Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3051595Z return user_fn(self, *args) 2025-09-07T07:34:42.3051630Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3051775Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3051817Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3051856Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3052039Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3052132Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3052167Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3052291Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3052329Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3052396Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3052563Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3052614Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3052652Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3052830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3052882Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3052921Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3053081Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3053128Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3053166Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3054316Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3054355Z t = dispatch_trace( 2025-09-07T07:34:42.3054389Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3054502Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3054544Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3054578Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3054731Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3054768Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3054802Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3054963Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3055041Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3055083Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3057070Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3057109Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3057143Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3057272Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3057316Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3057351Z ^^^^^^^^^ 2025-09-07T07:34:42.3057502Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3057550Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3057583Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3058600Z File "", line 1, in 2025-09-07T07:34:42.3058747Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3058825Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3058870Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3059006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3059052Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3059119Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3059311Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3059354Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3059388Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3059560Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3059643Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3059680Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3059824Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3059865Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3059901Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3060036Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3060168Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3060214Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3060342Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3060401Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3060446Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3061566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3061605Z leaves = list(leaves) 2025-09-07T07:34:42.3061638Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3061762Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3061825Z return func(x) 2025-09-07T07:34:42.3061859Z ^^^^^^^ 2025-09-07T07:34:42.3061996Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3062061Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3062101Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3062273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3062315Z return func(*args, **kwargs) 2025-09-07T07:34:42.3062351Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3062533Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3062618Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3062621Z 2025-09-07T07:34:42.3062875Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3062878Z 2025-09-07T07:34:42.3062880Z 2025-09-07T07:34:42.3062953Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3063151Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.3063153Z 2025-09-07T07:34:42.3063283Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3063357Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3063392Z inline_call [] 2025-09-07T07:34:42.3063445Z stats [('calls_captured', 8), ('unique_graphs', 1)] 2025-09-07T07:34:42.3064419Z inductor [] 2025-09-07T07:34:42.3064494Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3064614Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3064893Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3065006Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 781, in forward 2025-09-07T07:34:42.3065087Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, [ci, a, b]) 2025-09-07T07:34:42.3065269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3065355Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3065486Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3065604Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3065676Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3065710Z inline_call [] 2025-09-07T07:34:42.3065764Z stats [('calls_captured', 8), ('unique_graphs', 1)] 2025-09-07T07:34:42.3065882Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3065953Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3066208Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3066319Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 781, in forward 2025-09-07T07:34:42.3066399Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, [ci, a, b]) 2025-09-07T07:34:42.3066609Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3066719Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3066851Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3066969Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3068058Z =================================== FAILURES =================================== 2025-09-07T07:34:42.3068174Z _ WhileLoopTests.test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.3068218Z Traceback (most recent call last): 2025-09-07T07:34:42.3068371Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1230, in test_while_loop_simple_control_flow 2025-09-07T07:34:42.3068405Z self._run_test( 2025-09-07T07:34:42.3068516Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3068615Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3068655Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3068788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3068834Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3068872Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3069023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3069068Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3069107Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3069244Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3069287Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3069325Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3069492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3069572Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3069610Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3069765Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3070834Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3070986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3071039Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3071077Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3071220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3071273Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3071311Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3071426Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3071495Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3071538Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3071665Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3071728Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3071771Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3071912Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3071955Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3072009Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3072146Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3072185Z return aot_autograd( 2025-09-07T07:34:42.3072219Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3072355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3072422Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3072470Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3073627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3073714Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3073758Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3073944Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3073986Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3074174Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3074212Z fx_g = _create_graph( 2025-09-07T07:34:42.3074247Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3074413Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3074447Z fx_g = make_fx( 2025-09-07T07:34:42.3074478Z ^^^^^^^^ 2025-09-07T07:34:42.3074630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3074675Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3074731Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3074878Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3074921Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3074956Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3075115Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3075152Z t = dispatch_trace( 2025-09-07T07:34:42.3075224Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3075338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3076409Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3076446Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3076642Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3076683Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3076762Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3076925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3077003Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3077044Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3077170Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3077209Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3077242Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3077368Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3077409Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3077443Z ^^^^^^^^^ 2025-09-07T07:34:42.3077603Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3077645Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3077679Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3077831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3077880Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3077913Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3078071Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3078132Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3079174Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3079351Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3079394Z outs_pair = fn(*args) 2025-09-07T07:34:42.3079427Z ^^^^^^^^^ 2025-09-07T07:34:42.3079601Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3079667Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3079711Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3079887Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3079924Z outs_pair = fn(*args) 2025-09-07T07:34:42.3079957Z ^^^^^^^^^ 2025-09-07T07:34:42.3080135Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3080267Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3080311Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3080533Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3080603Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3080648Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3080859Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3080897Z outs_pair = fn(*args) 2025-09-07T07:34:42.3080931Z ^^^^^^^^^ 2025-09-07T07:34:42.3081120Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3081165Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3081201Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3082317Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3082363Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3082399Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3082523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3082566Z return handle_torch_function( 2025-09-07T07:34:42.3082603Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3082747Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3082821Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3082865Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3083032Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3083096Z return func(*args, **kwargs) 2025-09-07T07:34:42.3083130Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3083255Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3083295Z result = _engine_run_backward( 2025-09-07T07:34:42.3083330Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3083478Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3083599Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3083648Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3083774Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3083818Z return user_fn(self, *args) 2025-09-07T07:34:42.3083854Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3084933Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3084978Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3085013Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3085172Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3085217Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3085252Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3085377Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3085414Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3085449Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3085617Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3085691Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3085729Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3085867Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3085915Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3085953Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3086143Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3086192Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3086230Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3086389Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3086427Z t = dispatch_trace( 2025-09-07T07:34:42.3086461Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3086636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3087626Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3087661Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3087786Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3087824Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3087861Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3088022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3088101Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3088142Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3088266Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3088338Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3088372Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3088499Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3088539Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3088573Z ^^^^^^^^^ 2025-09-07T07:34:42.3088726Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3088775Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3088808Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3088849Z File "", line 1, in 2025-09-07T07:34:42.3088992Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3089069Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3089114Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3090188Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3090235Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3090272Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3090465Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3090509Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3090544Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3090715Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3090757Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3090796Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3090965Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3091008Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3091042Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3091177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3091266Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3091357Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3091486Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3091548Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3091591Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3091723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3091763Z leaves = list(leaves) 2025-09-07T07:34:42.3091796Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3091924Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3092891Z return func(x) 2025-09-07T07:34:42.3092924Z ^^^^^^^ 2025-09-07T07:34:42.3093062Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3093129Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3093169Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3093341Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3093380Z return func(*args, **kwargs) 2025-09-07T07:34:42.3093439Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3093621Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3093707Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3093709Z 2025-09-07T07:34:42.3093921Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3093923Z 2025-09-07T07:34:42.3093925Z 2025-09-07T07:34:42.3093998Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3094199Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.3094201Z 2025-09-07T07:34:42.3094286Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3094359Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3094395Z inline_call [] 2025-09-07T07:34:42.3094452Z stats [('calls_captured', 8), ('unique_graphs', 1)] 2025-09-07T07:34:42.3094486Z inductor [] 2025-09-07T07:34:42.3094559Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3094631Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3094892Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3095937Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 781, in forward 2025-09-07T07:34:42.3096020Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, [ci, a, b]) 2025-09-07T07:34:42.3096172Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3096279Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3096417Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3096602Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3096675Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3096708Z inline_call [] 2025-09-07T07:34:42.3096806Z stats [('calls_captured', 8), ('unique_graphs', 1)] 2025-09-07T07:34:42.3096878Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3096948Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3097208Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3097324Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 781, in forward 2025-09-07T07:34:42.3097406Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, [ci, a, b]) 2025-09-07T07:34:42.3097564Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3097646Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3097780Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3097899Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3097969Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3098002Z inline_call [] 2025-09-07T07:34:42.3098055Z stats [('calls_captured', 8), ('unique_graphs', 1)] 2025-09-07T07:34:42.3098150Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3099174Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3099431Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3099540Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 781, in forward 2025-09-07T07:34:42.3099622Z return torch._higher_order_ops.while_loop(cond_fn, body_fn, [ci, a, b]) 2025-09-07T07:34:42.3099771Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3099854Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3099985Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3100105Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3100322Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-1c6ac9beff6458c9.xml - 2025-09-07T07:34:42.3100379Z =========================== short test summary info ============================ 2025-09-07T07:34:42.3100752Z FAILED [0.7726s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3100835Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3100837Z 2025-09-07T07:34:42.3101044Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3101074Z 2025-09-07T07:34:42.3101076Z 2025-09-07T07:34:42.3101146Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3101348Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.3101350Z 2025-09-07T07:34:42.3101434Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3101524Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.3101590Z ================== 1 failed, 245 deselected, 2 rerun in 2.74s ================== 2025-09-07T07:34:42.3101625Z Got exit code 1 2025-09-07T07:34:42.3101754Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-09-07T07:34:42.3102178Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.3103162Z import pkg_resources 2025-09-07T07:34:42.3103331Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-6963000781352540.xml 2025-09-07T07:34:42.3103389Z ============================= test session starts ============================== 2025-09-07T07:34:42.3103502Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.3103540Z cachedir: .pytest_cache 2025-09-07T07:34:42.3103702Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.3103745Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.3103803Z configfile: pytest.ini 2025-09-07T07:34:42.3103963Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.3104039Z collecting ... collected 467 items / 37 deselected / 430 selected 2025-09-07T07:34:42.3104089Z stepcurrent: skipping 37 already run items. 2025-09-07T07:34:42.3104131Z Running 209 items in this shard 2025-09-07T07:34:42.3104134Z 2025-09-07T07:34:42.3104319Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cuda_dynamic_False_autograd_False PASSED [1.7732s] [ 0%] 2025-09-07T07:34:42.3104493Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cuda_dynamic_True_autograd_False PASSED [1.7650s] [ 0%] 2025-09-07T07:34:42.3104649Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_stack_output_simple_device_cpu_dynamic_True PASSED [1.1083s] [ 1%] 2025-09-07T07:34:42.3104807Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_stack_output_simple_device_cuda_dynamic_False PASSED [0.9939s] [ 1%] 2025-09-07T07:34:42.3104961Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_stack_output_simple_device_cuda_dynamic_True PASSED [0.9860s] [ 2%] 2025-09-07T07:34:42.3105215Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_False [W907 07:21:30.575441024 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3105323Z [W907 07:21:31.601555767 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3105426Z [W907 07:21:31.617092778 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3105526Z [W907 07:21:31.063356693 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3106731Z [W907 07:21:31.126786796 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3106866Z [W907 07:21:31.170339732 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3106965Z [W907 07:21:31.201791487 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3107001Z PASSED [0.7215s] [ 2%] 2025-09-07T07:34:42.3107252Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True [W907 07:21:31.357992859 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3107388Z [W907 07:21:31.381269694 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3107437Z ('RERUN', {'yellow': True}) [0.3490s] [ 3%] 2025-09-07T07:34:42.3107681Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True [W907 07:21:32.681449299 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3107784Z [W907 07:21:32.693028547 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3107830Z ('RERUN', {'yellow': True}) [0.2414s] [ 3%] 2025-09-07T07:34:42.3108077Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True [W907 07:21:32.925576611 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3108175Z [W907 07:21:32.936057676 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3108213Z FAILED [0.2443s] [ 3%] 2025-09-07T07:34:42.3108215Z 2025-09-07T07:34:42.3108264Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.3108369Z _ WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.3108413Z Traceback (most recent call last): 2025-09-07T07:34:42.3108557Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3108615Z self._run_test( 2025-09-07T07:34:42.3108730Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3108785Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3108826Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3108959Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3109962Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3110001Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3110155Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3110199Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3110238Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3110373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3110420Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3110456Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3110600Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3110681Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3110717Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3110875Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3110920Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3111071Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3111124Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3111166Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3111327Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3111377Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3111416Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3111531Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3111629Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3112605Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3112737Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3112801Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3112841Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3112983Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3113028Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3113065Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3113205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3113245Z return aot_autograd( 2025-09-07T07:34:42.3113279Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3113418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3113486Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3113531Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3113691Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3113792Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3113838Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3114021Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3114064Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3114252Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3114291Z fx_g = _create_graph( 2025-09-07T07:34:42.3114326Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3114489Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3115457Z fx_g = make_fx( 2025-09-07T07:34:42.3115489Z ^^^^^^^^ 2025-09-07T07:34:42.3115645Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3115691Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3115730Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3115877Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3115919Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3115956Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3116115Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3116152Z t = dispatch_trace( 2025-09-07T07:34:42.3116185Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3116299Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3116339Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3116375Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3116599Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3116640Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3116674Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3116837Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3116916Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3116995Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3117121Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3118107Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3118143Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3118273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3118315Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3118349Z ^^^^^^^^^ 2025-09-07T07:34:42.3118481Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3118521Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3118555Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3118703Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3118754Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3118787Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3118944Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3119005Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3119049Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3119249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3119288Z outs_pair = fn(*args) 2025-09-07T07:34:42.3119322Z ^^^^^^^^^ 2025-09-07T07:34:42.3119495Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3119560Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3119606Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3119777Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3119815Z outs_pair = fn(*args) 2025-09-07T07:34:42.3120865Z ^^^^^^^^^ 2025-09-07T07:34:42.3121045Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3121107Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3121148Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3121343Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3121413Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3121458Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3121632Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3121668Z outs_pair = fn(*args) 2025-09-07T07:34:42.3121703Z ^^^^^^^^^ 2025-09-07T07:34:42.3121899Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3121970Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3122005Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3122174Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3122218Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3122255Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3122410Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3122452Z return handle_torch_function( 2025-09-07T07:34:42.3122488Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3122628Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3122703Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3122752Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3123860Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3123900Z return func(*args, **kwargs) 2025-09-07T07:34:42.3123936Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3124059Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3124105Z result = _engine_run_backward( 2025-09-07T07:34:42.3124140Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3124288Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3124408Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3124457Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3124606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3124647Z return user_fn(self, *args) 2025-09-07T07:34:42.3124683Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3124830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3124873Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3124911Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3125068Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3125112Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3125147Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3125270Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3125309Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3125345Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3126439Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3126564Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3126604Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3126743Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3126791Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3126830Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3126990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3127036Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3127078Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3127265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3127303Z t = dispatch_trace( 2025-09-07T07:34:42.3127336Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3127450Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3127491Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3127527Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3127686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3127726Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3127759Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3127920Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3128000Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3128043Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3128166Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3129148Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3129183Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3129310Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3129350Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3129387Z ^^^^^^^^^ 2025-09-07T07:34:42.3129539Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3129588Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3129621Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3129661Z File "", line 1, in 2025-09-07T07:34:42.3129832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3129911Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3129956Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3130094Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3130141Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3130179Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3130371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3130414Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3130449Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3130620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3130667Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3130703Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3131778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3131820Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3131855Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3131992Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3132083Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3132127Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3132253Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3132315Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3132377Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3132502Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3132540Z leaves = list(leaves) 2025-09-07T07:34:42.3132574Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3132697Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3132731Z return func(x) 2025-09-07T07:34:42.3132789Z ^^^^^^^ 2025-09-07T07:34:42.3132927Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3132990Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3133032Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3133204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3133247Z return func(*args, **kwargs) 2025-09-07T07:34:42.3133282Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3133463Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3134477Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3134480Z 2025-09-07T07:34:42.3134689Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3134692Z 2025-09-07T07:34:42.3134694Z 2025-09-07T07:34:42.3134771Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3134956Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.3134992Z 2025-09-07T07:34:42.3135078Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3135151Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3135186Z inline_call [] 2025-09-07T07:34:42.3135239Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3135312Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3135385Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3135650Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3135764Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3135816Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3135969Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3136055Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3136187Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3136306Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3136408Z _ WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.3136450Z Traceback (most recent call last): 2025-09-07T07:34:42.3136651Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3136685Z self._run_test( 2025-09-07T07:34:42.3137757Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3137814Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3137885Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3138019Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3138065Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3138103Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3138254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3138332Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3138371Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3138507Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3138550Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3138586Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3138730Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3138810Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3138850Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3139006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3139051Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3139205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3139258Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3139297Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3139438Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3139510Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3140496Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3140613Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3140679Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3140722Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3140849Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3140913Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3140954Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3141094Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3141138Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3141174Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3141315Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3141355Z return aot_autograd( 2025-09-07T07:34:42.3141389Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3141524Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3141591Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3141638Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3141798Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3141881Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3141924Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3142108Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3142168Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3143283Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3143323Z fx_g = _create_graph( 2025-09-07T07:34:42.3143358Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3143552Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3143587Z fx_g = make_fx( 2025-09-07T07:34:42.3143618Z ^^^^^^^^ 2025-09-07T07:34:42.3143771Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3143815Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3143854Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3144002Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3144045Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3144079Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3144238Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3144275Z t = dispatch_trace( 2025-09-07T07:34:42.3144308Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3144422Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3144462Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3144498Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3144622Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3144661Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3144712Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3144874Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3145880Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3145922Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3146046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3146087Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3146120Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3146247Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3146288Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3146322Z ^^^^^^^^^ 2025-09-07T07:34:42.3146453Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3146567Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3146601Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3146750Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3146799Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3146833Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3146993Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3147055Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3147098Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3147274Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3147313Z outs_pair = fn(*args) 2025-09-07T07:34:42.3147376Z ^^^^^^^^^ 2025-09-07T07:34:42.3147549Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3148552Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3148596Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3148775Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3148862Z outs_pair = fn(*args) 2025-09-07T07:34:42.3148896Z ^^^^^^^^^ 2025-09-07T07:34:42.3149074Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3149132Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3149174Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3149373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3149443Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3149487Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3149659Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3149697Z outs_pair = fn(*args) 2025-09-07T07:34:42.3149732Z ^^^^^^^^^ 2025-09-07T07:34:42.3149921Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3149965Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3150000Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3150193Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3150239Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3150276Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3150402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3150444Z return handle_torch_function( 2025-09-07T07:34:42.3151416Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3151562Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3151635Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3151680Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3151851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3151895Z return func(*args, **kwargs) 2025-09-07T07:34:42.3151930Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3152053Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3152093Z result = _engine_run_backward( 2025-09-07T07:34:42.3152128Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3152272Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3152394Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3152443Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3152569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3152610Z return user_fn(self, *args) 2025-09-07T07:34:42.3152676Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3152820Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3152863Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3152899Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3153055Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3153099Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3153170Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3154236Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3154274Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3154310Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3154474Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3154527Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3154566Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3154704Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3154751Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3154790Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3154953Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3155001Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3155039Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3155198Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3155235Z t = dispatch_trace( 2025-09-07T07:34:42.3155291Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3155404Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3155446Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3155482Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3155606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3155644Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3155678Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3156954Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3157038Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3157078Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3157202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3157244Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3157277Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3157404Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3157445Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3157478Z ^^^^^^^^^ 2025-09-07T07:34:42.3157628Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3157679Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3157712Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3157754Z File "", line 1, in 2025-09-07T07:34:42.3157918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3157998Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3158044Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3158214Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3158259Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3158297Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3158492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3158571Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3158607Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3159739Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3159784Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3159821Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3159965Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3160007Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3160042Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3160266Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3160355Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3160403Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3160529Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3160588Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3160632Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3160756Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3160826Z leaves = list(leaves) 2025-09-07T07:34:42.3160859Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3160984Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3161018Z return func(x) 2025-09-07T07:34:42.3161050Z ^^^^^^^ 2025-09-07T07:34:42.3161188Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3161254Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3161295Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3162402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3162443Z return func(*args, **kwargs) 2025-09-07T07:34:42.3162479Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3162660Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3162746Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3162749Z 2025-09-07T07:34:42.3162955Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3162958Z 2025-09-07T07:34:42.3162959Z 2025-09-07T07:34:42.3163035Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3163219Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.3163221Z 2025-09-07T07:34:42.3163305Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3163378Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3163434Z inline_call [] 2025-09-07T07:34:42.3163487Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3163560Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3163631Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3163889Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3164031Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3164083Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3164234Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3164319Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3164454Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3164573Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3164643Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3165621Z inline_call [] 2025-09-07T07:34:42.3165675Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3165750Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3165820Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3166073Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3166186Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3166255Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3166405Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3166588Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3166719Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3166840Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3166889Z =================================== FAILURES =================================== 2025-09-07T07:34:42.3166988Z _ WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.3167031Z Traceback (most recent call last): 2025-09-07T07:34:42.3167167Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3167206Z self._run_test( 2025-09-07T07:34:42.3167317Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3167372Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3167412Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3167544Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3167591Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3168585Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3168738Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3168784Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3168821Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3168956Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3169036Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3169074Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3169216Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3169296Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3169334Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3169526Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3169573Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3169726Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3169778Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3169818Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3169961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3170012Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3170052Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3170171Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3170235Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3170280Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3170406Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3171407Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3171449Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3171618Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3171662Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3171698Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3171836Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3171874Z return aot_autograd( 2025-09-07T07:34:42.3171908Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3172047Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3172116Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3172160Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3172321Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3172405Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3172452Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3172634Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3172677Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3172870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3172910Z fx_g = _create_graph( 2025-09-07T07:34:42.3172944Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3173107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3173140Z fx_g = make_fx( 2025-09-07T07:34:42.3174238Z ^^^^^^^^ 2025-09-07T07:34:42.3174394Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3174457Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3174494Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3174640Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3174682Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3174717Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3174907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3174945Z t = dispatch_trace( 2025-09-07T07:34:42.3174978Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3175092Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3175132Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3175167Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3175294Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3175332Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3175367Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3175529Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3175608Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3175649Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3175776Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3175813Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3175847Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3177005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3177083Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3177117Z ^^^^^^^^^ 2025-09-07T07:34:42.3177251Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3177290Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3177325Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3177472Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3177524Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3177557Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3177714Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3177775Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3177819Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3177998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3178037Z outs_pair = fn(*args) 2025-09-07T07:34:42.3178071Z ^^^^^^^^^ 2025-09-07T07:34:42.3178241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3178307Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3178353Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3178525Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3178562Z outs_pair = fn(*args) 2025-09-07T07:34:42.3178596Z ^^^^^^^^^ 2025-09-07T07:34:42.3178773Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3179815Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3179856Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3180052Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3180121Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3180209Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3180382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3180421Z outs_pair = fn(*args) 2025-09-07T07:34:42.3180454Z ^^^^^^^^^ 2025-09-07T07:34:42.3180643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3180690Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3180726Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3180894Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3180940Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3180976Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3181102Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3181144Z return handle_torch_function( 2025-09-07T07:34:42.3181180Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3181320Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3181394Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3181458Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3181625Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3182604Z return func(*args, **kwargs) 2025-09-07T07:34:42.3182640Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3182765Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3182806Z result = _engine_run_backward( 2025-09-07T07:34:42.3182844Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3182990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3183110Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3183159Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3183289Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3183330Z return user_fn(self, *args) 2025-09-07T07:34:42.3183365Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3183508Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3183551Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3183587Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3183746Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3183789Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3183825Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3183948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3184006Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3184040Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3184205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3184255Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3185225Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3185361Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3185443Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3185481Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3185644Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3185690Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3185730Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3185890Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3185928Z t = dispatch_trace( 2025-09-07T07:34:42.3185961Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3186075Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3186116Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3186152Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3186277Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3186315Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3186350Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3186653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3186732Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3186799Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3186923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3186961Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3187962Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3188090Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3188131Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3188169Z ^^^^^^^^^ 2025-09-07T07:34:42.3188318Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3188366Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3188400Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3188440Z File "", line 1, in 2025-09-07T07:34:42.3188586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3188664Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3188709Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3188847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3188893Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3188933Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3189126Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3189168Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3189203Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3189373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3189442Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3189478Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3189621Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3189663Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3190631Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3190825Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3190915Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3190959Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3191083Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3191144Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3191187Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3191312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3191349Z leaves = list(leaves) 2025-09-07T07:34:42.3191384Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3191506Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3191541Z return func(x) 2025-09-07T07:34:42.3191574Z ^^^^^^^ 2025-09-07T07:34:42.3191713Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3191776Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3191817Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3191985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3192047Z return func(*args, **kwargs) 2025-09-07T07:34:42.3192082Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3192263Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3192348Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3192350Z 2025-09-07T07:34:42.3193498Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3193501Z 2025-09-07T07:34:42.3193502Z 2025-09-07T07:34:42.3193575Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3193761Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.3193768Z 2025-09-07T07:34:42.3193852Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3193926Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3193960Z inline_call [] 2025-09-07T07:34:42.3194014Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3194087Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3194158Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3194417Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3194533Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3194584Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3194736Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3194845Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3194977Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3195097Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3195199Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3195234Z inline_call [] 2025-09-07T07:34:42.3195287Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3195358Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3195427Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3195679Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3196803Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3196855Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3197010Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3197095Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3197225Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3197342Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3197412Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3197476Z inline_call [] 2025-09-07T07:34:42.3197531Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3197601Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3197670Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3197922Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3198035Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3198084Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3198234Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3198317Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3198447Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3198568Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3198780Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-6963000781352540.xml - 2025-09-07T07:34:42.3198837Z =========================== short test summary info ============================ 2025-09-07T07:34:42.3199197Z FAILED [0.2443s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3200325Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3200327Z 2025-09-07T07:34:42.3200535Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3200565Z 2025-09-07T07:34:42.3200567Z 2025-09-07T07:34:42.3200639Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3200824Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.3200827Z 2025-09-07T07:34:42.3200947Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3201008Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.3201079Z ============= 1 failed, 6 passed, 37 deselected, 2 rerun in 8.37s ============== 2025-09-07T07:34:42.3201114Z Got exit code 1 2025-09-07T07:34:42.3201151Z Retrying single test... 2025-09-07T07:34:42.3201576Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.3201615Z import pkg_resources 2025-09-07T07:34:42.3201785Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-a34293b586ee1967.xml 2025-09-07T07:34:42.3201839Z ============================= test session starts ============================== 2025-09-07T07:34:42.3201955Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.3201993Z cachedir: .pytest_cache 2025-09-07T07:34:42.3202151Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.3202195Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.3202233Z configfile: pytest.ini 2025-09-07T07:34:42.3202413Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.3202490Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.3202709Z stepcurrent: skipping 43 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.3203702Z Running 1 items in this shard 2025-09-07T07:34:42.3203705Z 2025-09-07T07:34:42.3203959Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True [W907 07:21:39.510831985 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3204066Z [W907 07:21:39.571362181 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3204167Z [W907 07:21:40.588097503 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3204218Z ('RERUN', {'yellow': True}) [0.4110s] [100%] 2025-09-07T07:34:42.3204463Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True [W907 07:21:40.913889219 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3204564Z [W907 07:21:40.926049540 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3204609Z ('RERUN', {'yellow': True}) [0.2416s] [100%] 2025-09-07T07:34:42.3204855Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True [W907 07:21:40.157699806 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3204955Z [W907 07:21:40.203932393 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3204992Z FAILED [0.2967s] [100%] 2025-09-07T07:34:42.3204994Z 2025-09-07T07:34:42.3205043Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.3205162Z _ WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.3205206Z Traceback (most recent call last): 2025-09-07T07:34:42.3205347Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3205381Z self._run_test( 2025-09-07T07:34:42.3205496Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3205582Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3205623Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3205757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3205803Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3205840Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3207004Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3207053Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3207091Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3207228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3207271Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3207307Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3207452Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3207533Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3207570Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3207725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3207799Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3207951Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3208003Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3208044Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3208185Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3208238Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3208276Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3208393Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3208459Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3208502Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3208630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3209635Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3209677Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3209818Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3209861Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3209900Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3210038Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3210077Z return aot_autograd( 2025-09-07T07:34:42.3210111Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3210247Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3210319Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3210387Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3210548Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3210632Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3210676Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3210893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3210936Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3211122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3211161Z fx_g = _create_graph( 2025-09-07T07:34:42.3211196Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3211360Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3211393Z fx_g = make_fx( 2025-09-07T07:34:42.3211426Z ^^^^^^^^ 2025-09-07T07:34:42.3212512Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3212560Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3212599Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3212747Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3212789Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3212825Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3212983Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3213047Z t = dispatch_trace( 2025-09-07T07:34:42.3213080Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3213193Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3213234Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3213269Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3213392Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3213433Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3213468Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3213630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3213709Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3213750Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3213877Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3213916Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3213951Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3215037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3215079Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3215113Z ^^^^^^^^^ 2025-09-07T07:34:42.3215247Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3215287Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3215322Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3215470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3215520Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3215555Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3215733Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3215793Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3215837Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3216011Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3216050Z outs_pair = fn(*args) 2025-09-07T07:34:42.3216112Z ^^^^^^^^^ 2025-09-07T07:34:42.3216285Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3216351Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3216396Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3216646Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3216686Z outs_pair = fn(*args) 2025-09-07T07:34:42.3216719Z ^^^^^^^^^ 2025-09-07T07:34:42.3216901Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3217916Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3217962Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3218157Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3218228Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3218272Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3218478Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3218519Z outs_pair = fn(*args) 2025-09-07T07:34:42.3218553Z ^^^^^^^^^ 2025-09-07T07:34:42.3218742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3218787Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3218822Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3218993Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3219039Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3219075Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3219201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3219246Z return handle_torch_function( 2025-09-07T07:34:42.3219283Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3219424Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3219498Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3219542Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3219711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3219751Z return func(*args, **kwargs) 2025-09-07T07:34:42.3220719Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3220847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3220889Z result = _engine_run_backward( 2025-09-07T07:34:42.3220924Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3221074Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3221222Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3221271Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3221396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3221473Z return user_fn(self, *args) 2025-09-07T07:34:42.3221508Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3221654Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3221697Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3221733Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3221890Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3221936Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3221971Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3222095Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3222134Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3222168Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3222336Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3222386Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3223360Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3223498Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3223546Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3223609Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3223772Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3223819Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3223858Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3224016Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3224054Z t = dispatch_trace( 2025-09-07T07:34:42.3224089Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3224202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3224243Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3224279Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3224402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3224444Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3224477Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3224638Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3224716Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3224756Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3224882Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3224919Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3224952Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3226010Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3226051Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3226085Z ^^^^^^^^^ 2025-09-07T07:34:42.3226237Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3226308Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3226341Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3226382Z File "", line 1, in 2025-09-07T07:34:42.3226583Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3226661Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3226743Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3226883Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3226930Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3226967Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3227160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3227205Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3227240Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3227411Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3227454Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3227490Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3227634Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3227675Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3228654Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3228789Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3228904Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3228954Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3229081Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3229141Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3229184Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3229312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3229350Z leaves = list(leaves) 2025-09-07T07:34:42.3229384Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3229506Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3229541Z return func(x) 2025-09-07T07:34:42.3229572Z ^^^^^^^ 2025-09-07T07:34:42.3229711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3229778Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3229819Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3229986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3230028Z return func(*args, **kwargs) 2025-09-07T07:34:42.3230063Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3230247Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3230332Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3230335Z 2025-09-07T07:34:42.3230542Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3230567Z 2025-09-07T07:34:42.3230568Z 2025-09-07T07:34:42.3231576Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3231763Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.3231765Z 2025-09-07T07:34:42.3231851Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3231960Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3231995Z inline_call [] 2025-09-07T07:34:42.3232049Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3232082Z inductor [] 2025-09-07T07:34:42.3232156Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3232227Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3232488Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3232603Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3232654Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3232804Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3232891Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3233022Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3233142Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3233239Z _ WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.3233301Z Traceback (most recent call last): 2025-09-07T07:34:42.3233437Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3233472Z self._run_test( 2025-09-07T07:34:42.3233584Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3234576Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3234616Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3234753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3234798Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3234836Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3234986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3235031Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3235071Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3235209Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3235251Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3235287Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3235430Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3235512Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3235550Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3235702Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3235746Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3235895Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3235969Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3236008Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3236150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3236200Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3236239Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3237391Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3237461Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3237503Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3237630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3237692Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3237737Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3237877Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3237921Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3237957Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3238095Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3238134Z return aot_autograd( 2025-09-07T07:34:42.3238169Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3238307Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3238376Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3238421Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3238601Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3238686Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3238731Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3238914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3238957Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3239144Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3240128Z fx_g = _create_graph( 2025-09-07T07:34:42.3240206Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3240371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3240407Z fx_g = make_fx( 2025-09-07T07:34:42.3240440Z ^^^^^^^^ 2025-09-07T07:34:42.3240591Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3240636Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3240672Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3240821Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3240865Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3240901Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3241059Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3241095Z t = dispatch_trace( 2025-09-07T07:34:42.3241129Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3241242Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3241309Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3241343Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3241468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3241507Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3241542Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3241736Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3241816Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3242796Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3242921Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3242959Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3242995Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3243123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3243164Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3243198Z ^^^^^^^^^ 2025-09-07T07:34:42.3243330Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3243369Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3243404Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3243554Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3243602Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3243636Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3243792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3243872Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3243917Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3244093Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3244131Z outs_pair = fn(*args) 2025-09-07T07:34:42.3244166Z ^^^^^^^^^ 2025-09-07T07:34:42.3244337Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3244404Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3244447Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3245551Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3245590Z outs_pair = fn(*args) 2025-09-07T07:34:42.3245628Z ^^^^^^^^^ 2025-09-07T07:34:42.3245804Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3245864Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3245905Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3246105Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3246176Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3246222Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3246394Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3246431Z outs_pair = fn(*args) 2025-09-07T07:34:42.3246466Z ^^^^^^^^^ 2025-09-07T07:34:42.3246750Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3246794Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3246830Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3246998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3247085Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3247121Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3247249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3247290Z return handle_torch_function( 2025-09-07T07:34:42.3247325Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3248410Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3248487Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3248531Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3248700Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3248740Z return func(*args, **kwargs) 2025-09-07T07:34:42.3248774Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3248901Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3248942Z result = _engine_run_backward( 2025-09-07T07:34:42.3248977Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3249125Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3249246Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3249319Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3249446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3249487Z return user_fn(self, *args) 2025-09-07T07:34:42.3249522Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3249668Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3249711Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3249746Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3249904Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3249946Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3249982Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3250106Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3251075Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3251110Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3251276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3251327Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3251369Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3251505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3251554Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3251590Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3251752Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3251827Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3251865Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3252025Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3252062Z t = dispatch_trace( 2025-09-07T07:34:42.3252096Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3252209Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3252284Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3252319Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3252443Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3252481Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3252515Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3252674Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3253683Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3253724Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3253848Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3253885Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3253919Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3254046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3254087Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3254120Z ^^^^^^^^^ 2025-09-07T07:34:42.3254269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3254317Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3254374Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3254415Z File "", line 1, in 2025-09-07T07:34:42.3254558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3254634Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3254678Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3254816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3254862Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3254900Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3255092Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3255135Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3255170Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3255342Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3256312Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3256350Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3256551Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3256594Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3256630Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3256764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3256850Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3256896Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3257023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3257113Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3257155Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3257281Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3257318Z leaves = list(leaves) 2025-09-07T07:34:42.3257353Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3257514Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3257549Z return func(x) 2025-09-07T07:34:42.3257581Z ^^^^^^^ 2025-09-07T07:34:42.3257719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3257783Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3257826Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3257997Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3258981Z return func(*args, **kwargs) 2025-09-07T07:34:42.3259016Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3259197Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3259284Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3259286Z 2025-09-07T07:34:42.3259494Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3259496Z 2025-09-07T07:34:42.3259498Z 2025-09-07T07:34:42.3259570Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3259781Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.3259787Z 2025-09-07T07:34:42.3259871Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3259945Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3259979Z inline_call [] 2025-09-07T07:34:42.3260033Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3260066Z inductor [] 2025-09-07T07:34:42.3260142Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3260212Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3260471Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3260588Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3260640Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3260791Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3260875Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3261007Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3261128Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3261199Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3262160Z inline_call [] 2025-09-07T07:34:42.3262213Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3262285Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3262374Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3262629Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3262743Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3262792Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3262969Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3263054Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3263182Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3263299Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3263351Z =================================== FAILURES =================================== 2025-09-07T07:34:42.3263448Z _ WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.3263491Z Traceback (most recent call last): 2025-09-07T07:34:42.3263627Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3263662Z self._run_test( 2025-09-07T07:34:42.3263775Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3263829Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3263868Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3264000Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3264047Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3265035Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3265191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3265237Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3265275Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3265411Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3265453Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3265492Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3265633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3265713Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3265751Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3265906Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3265953Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3266104Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3266157Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3266196Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3266342Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3266392Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3266431Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3266608Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3266674Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3266743Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3266869Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3267870Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3267912Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3268052Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3268142Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3268178Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3268316Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3268354Z return aot_autograd( 2025-09-07T07:34:42.3268389Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3268527Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3268601Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3268645Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3268806Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3268889Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3268935Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3269117Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3269160Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3269345Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3269410Z fx_g = _create_graph( 2025-09-07T07:34:42.3269444Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3269607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3269640Z fx_g = make_fx( 2025-09-07T07:34:42.3270606Z ^^^^^^^^ 2025-09-07T07:34:42.3270758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3270807Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3270843Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3270990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3271032Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3271067Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3271230Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3271267Z t = dispatch_trace( 2025-09-07T07:34:42.3271300Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3271413Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3271454Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3271488Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3271616Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3271654Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3271689Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3271850Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3271929Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3271971Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3272123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3272160Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3272197Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3273250Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3273292Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3273325Z ^^^^^^^^^ 2025-09-07T07:34:42.3273489Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3273528Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3273563Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3273711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3273761Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3273795Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3273952Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3274014Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3274057Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3274235Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3274273Z outs_pair = fn(*args) 2025-09-07T07:34:42.3274308Z ^^^^^^^^^ 2025-09-07T07:34:42.3274481Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3274547Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3274609Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3274784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3274822Z outs_pair = fn(*args) 2025-09-07T07:34:42.3274855Z ^^^^^^^^^ 2025-09-07T07:34:42.3275031Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3276019Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3276065Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3276262Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3276331Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3276376Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3276616Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3276655Z outs_pair = fn(*args) 2025-09-07T07:34:42.3276687Z ^^^^^^^^^ 2025-09-07T07:34:42.3276877Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3276921Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3276960Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3277133Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3277179Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3277215Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3277341Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3277409Z return handle_torch_function( 2025-09-07T07:34:42.3277445Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3277586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3277659Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3277704Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3277905Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3278885Z return func(*args, **kwargs) 2025-09-07T07:34:42.3278922Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3279046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3279085Z result = _engine_run_backward( 2025-09-07T07:34:42.3279123Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3279268Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3279390Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3279437Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3279567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3279607Z return user_fn(self, *args) 2025-09-07T07:34:42.3279644Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3279787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3279830Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3279866Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3280053Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3280096Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3280132Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3280295Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3280334Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3280369Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3280535Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3280585Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3281560Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3281697Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3281751Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3281788Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3281949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3281995Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3282034Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3282194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3282233Z t = dispatch_trace( 2025-09-07T07:34:42.3282266Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3282378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3282420Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3282455Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3282579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3282638Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3282673Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3282833Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3282911Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3282952Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3283110Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3283148Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3284121Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3284250Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3284290Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3284324Z ^^^^^^^^^ 2025-09-07T07:34:42.3284474Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3284522Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3284555Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3284595Z File "", line 1, in 2025-09-07T07:34:42.3284739Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3284818Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3284863Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3284997Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3285044Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3285081Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3285292Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3285334Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3285369Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3285542Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3285585Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3285623Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3285765Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3285807Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3286810Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3286946Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3287037Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3287083Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3287207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3287268Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3287310Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3287439Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3287476Z leaves = list(leaves) 2025-09-07T07:34:42.3287510Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3287632Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3287666Z return func(x) 2025-09-07T07:34:42.3287698Z ^^^^^^^ 2025-09-07T07:34:42.3287869Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3287933Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3287974Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3288141Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3288182Z return func(*args, **kwargs) 2025-09-07T07:34:42.3288251Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3288432Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3288517Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3288519Z 2025-09-07T07:34:42.3289664Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3289669Z 2025-09-07T07:34:42.3289671Z 2025-09-07T07:34:42.3289744Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3289930Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.3289933Z 2025-09-07T07:34:42.3290017Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3290092Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3290126Z inline_call [] 2025-09-07T07:34:42.3290180Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3290213Z inductor [] 2025-09-07T07:34:42.3290286Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3290356Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3290648Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3290762Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3290813Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3290964Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3291050Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3291180Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3291299Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3291370Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3291405Z inline_call [] 2025-09-07T07:34:42.3291459Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3291530Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3291599Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3292789Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3292904Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3292953Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3293108Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3293195Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3293344Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3293464Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3293534Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3293567Z inline_call [] 2025-09-07T07:34:42.3293619Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3293721Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3293790Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3294042Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3294154Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3294203Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3294354Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3294437Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3294566Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3294684Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3294899Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-a34293b586ee1967.xml - 2025-09-07T07:34:42.3294955Z =========================== short test summary info ============================ 2025-09-07T07:34:42.3296252Z FAILED [0.2967s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3296339Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3296343Z 2025-09-07T07:34:42.3296631Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3296634Z 2025-09-07T07:34:42.3296636Z 2025-09-07T07:34:42.3296708Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3296892Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.3296894Z 2025-09-07T07:34:42.3296978Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3297040Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.3297105Z ================== 1 failed, 245 deselected, 2 rerun in 1.12s ================== 2025-09-07T07:34:42.3297140Z Got exit code 1 2025-09-07T07:34:42.3297177Z Retrying single test... 2025-09-07T07:34:42.3297600Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.3297639Z import pkg_resources 2025-09-07T07:34:42.3297807Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-047d8270e50b818e.xml 2025-09-07T07:34:42.3297862Z ============================= test session starts ============================== 2025-09-07T07:34:42.3298008Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.3298046Z cachedir: .pytest_cache 2025-09-07T07:34:42.3298203Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.3298247Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.3298285Z configfile: pytest.ini 2025-09-07T07:34:42.3298500Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.3298578Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.3298799Z stepcurrent: skipping 43 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.3299803Z Running 1 items in this shard 2025-09-07T07:34:42.3299807Z 2025-09-07T07:34:42.3300062Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True [W907 07:21:49.664137108 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3300168Z [W907 07:21:49.722812690 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3300270Z [W907 07:21:49.740121965 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3300317Z ('RERUN', {'yellow': True}) [0.4180s] [100%] 2025-09-07T07:34:42.3300566Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True [W907 07:21:49.098239722 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3300665Z [W907 07:21:49.111085822 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3300710Z ('RERUN', {'yellow': True}) [0.2697s] [100%] 2025-09-07T07:34:42.3300979Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True [W907 07:21:49.356883000 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3301079Z [W907 07:21:49.368060745 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3301114Z FAILED [0.2339s] [100%] 2025-09-07T07:34:42.3301117Z 2025-09-07T07:34:42.3301166Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.3301265Z _ WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.3301307Z Traceback (most recent call last): 2025-09-07T07:34:42.3301448Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3301482Z self._run_test( 2025-09-07T07:34:42.3301597Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3301653Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3301693Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3301828Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3301874Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3302846Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3303001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3303047Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3303085Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3303220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3303264Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3303303Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3303467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3303546Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3303585Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3303736Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3303781Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3303958Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3304012Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3304051Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3304194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3304246Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3304285Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3304401Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3304467Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3304510Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3305570Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3305635Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3305676Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3305816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3305861Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3305916Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3306056Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3306095Z return aot_autograd( 2025-09-07T07:34:42.3306128Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3306265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3306334Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3306380Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3306606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3306690Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3306734Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3306920Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3306962Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3307150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3307188Z fx_g = _create_graph( 2025-09-07T07:34:42.3307223Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3307390Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3307424Z fx_g = make_fx( 2025-09-07T07:34:42.3308400Z ^^^^^^^^ 2025-09-07T07:34:42.3308554Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3308599Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3308668Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3308816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3308858Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3308894Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3309053Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3309089Z t = dispatch_trace( 2025-09-07T07:34:42.3309167Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3309280Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3309321Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3309357Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3309482Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3309525Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3309560Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3309721Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3309800Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3309841Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3309967Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3310007Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3310041Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3311102Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3311143Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3311177Z ^^^^^^^^^ 2025-09-07T07:34:42.3311310Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3311384Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3311418Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3311568Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3311616Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3311650Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3311810Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3311872Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3311916Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3312092Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3312134Z outs_pair = fn(*args) 2025-09-07T07:34:42.3312169Z ^^^^^^^^^ 2025-09-07T07:34:42.3312342Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3312408Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3312452Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3312626Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3312664Z outs_pair = fn(*args) 2025-09-07T07:34:42.3312698Z ^^^^^^^^^ 2025-09-07T07:34:42.3313803Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3313864Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3313909Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3314121Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3314191Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3314235Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3314442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3314480Z outs_pair = fn(*args) 2025-09-07T07:34:42.3314514Z ^^^^^^^^^ 2025-09-07T07:34:42.3314703Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3314747Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3314782Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3314957Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3315001Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3315038Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3315163Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3315205Z return handle_torch_function( 2025-09-07T07:34:42.3315240Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3315384Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3315458Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3315503Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3315670Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3316842Z return func(*args, **kwargs) 2025-09-07T07:34:42.3316881Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3317008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3317048Z result = _engine_run_backward( 2025-09-07T07:34:42.3317083Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3317231Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3317353Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3317401Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3317527Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3317570Z return user_fn(self, *args) 2025-09-07T07:34:42.3317606Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3317750Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3317792Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3317828Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3317988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3318035Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3318071Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3318196Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3318235Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3318270Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3318435Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3319473Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3319513Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3319651Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3319698Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3319736Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3319939Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3319987Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3320025Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3320216Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3320254Z t = dispatch_trace( 2025-09-07T07:34:42.3320289Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3320402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3320444Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3320479Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3320604Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3320642Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3320677Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3320838Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3320916Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3320957Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3321080Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3321142Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3322114Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3322243Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3322283Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3322317Z ^^^^^^^^^ 2025-09-07T07:34:42.3322470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3322518Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3322551Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3322592Z File "", line 1, in 2025-09-07T07:34:42.3322733Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3322812Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3322860Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3322996Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3323042Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3323079Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3323274Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3323316Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3323351Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3323523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3323566Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3323602Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3323767Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3324733Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3324769Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3324906Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3324995Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3325070Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3325197Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3325256Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3325299Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3325423Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3325463Z leaves = list(leaves) 2025-09-07T07:34:42.3325497Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3325621Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3325654Z return func(x) 2025-09-07T07:34:42.3325686Z ^^^^^^^ 2025-09-07T07:34:42.3325824Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3325889Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3325929Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3326098Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3326138Z return func(*args, **kwargs) 2025-09-07T07:34:42.3326174Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3326370Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3326455Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3326458Z 2025-09-07T07:34:42.3327670Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3327673Z 2025-09-07T07:34:42.3327675Z 2025-09-07T07:34:42.3327752Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3327939Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.3327941Z 2025-09-07T07:34:42.3328027Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3328100Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3328138Z inline_call [] 2025-09-07T07:34:42.3328191Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3328225Z inductor [] 2025-09-07T07:34:42.3328298Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3328369Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3328632Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3328749Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3328798Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3328949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3329034Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3329201Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3329319Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3329419Z _ WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.3329461Z Traceback (most recent call last): 2025-09-07T07:34:42.3329641Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3329676Z self._run_test( 2025-09-07T07:34:42.3330732Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3330788Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3330827Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3330959Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3331006Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3331044Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3331194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3331239Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3331277Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3331415Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3331457Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3331494Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3331636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3331741Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3331780Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3331932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3331978Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3332128Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3332181Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3332221Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3332361Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3332411Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3333383Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3333502Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3333568Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3333611Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3333736Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3333799Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3333841Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3333982Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3334024Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3334060Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3334202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3334258Z return aot_autograd( 2025-09-07T07:34:42.3334292Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3334428Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3334497Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3334541Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3334727Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3334810Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3334855Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3335037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3335081Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3335267Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3336233Z fx_g = _create_graph( 2025-09-07T07:34:42.3336266Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3336430Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3336463Z fx_g = make_fx( 2025-09-07T07:34:42.3336548Z ^^^^^^^^ 2025-09-07T07:34:42.3336701Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3336747Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3336784Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3336931Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3337000Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3337036Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3337195Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3337232Z t = dispatch_trace( 2025-09-07T07:34:42.3337265Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3337379Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3337420Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3337456Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3337579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3337617Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3337652Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3337814Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3338837Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3338879Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3339003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3339040Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3339075Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3339202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3339244Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3339277Z ^^^^^^^^^ 2025-09-07T07:34:42.3339410Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3339449Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3339483Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3339658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3339708Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3339740Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3339897Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3339957Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3340042Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3340220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3340259Z outs_pair = fn(*args) 2025-09-07T07:34:42.3340292Z ^^^^^^^^^ 2025-09-07T07:34:42.3340467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3340535Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3341512Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3341685Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3341724Z outs_pair = fn(*args) 2025-09-07T07:34:42.3341757Z ^^^^^^^^^ 2025-09-07T07:34:42.3341937Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3341995Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3342037Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3342230Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3342323Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3342367Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3342539Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3342577Z outs_pair = fn(*args) 2025-09-07T07:34:42.3342610Z ^^^^^^^^^ 2025-09-07T07:34:42.3342801Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3342845Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3342882Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3343050Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3343097Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3343134Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3343261Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3343302Z return handle_torch_function( 2025-09-07T07:34:42.3344263Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3344406Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3344481Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3344525Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3344693Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3344732Z return func(*args, **kwargs) 2025-09-07T07:34:42.3344768Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3344894Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3344954Z result = _engine_run_backward( 2025-09-07T07:34:42.3344988Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3345135Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3345255Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3345332Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3345459Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3345500Z return user_fn(self, *args) 2025-09-07T07:34:42.3345534Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3345680Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3345724Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3345759Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3345918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3345960Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3345996Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3347120Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3347161Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3347196Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3347366Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3347417Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3347486Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3347625Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3347674Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3347711Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3347876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3347922Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3347962Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3348121Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3348159Z t = dispatch_trace( 2025-09-07T07:34:42.3348192Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3348306Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3348351Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3348387Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3348510Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3348548Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3348582Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3349681Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3349762Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3349803Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3349927Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3349965Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3349998Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3350132Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3350204Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3350237Z ^^^^^^^^^ 2025-09-07T07:34:42.3350387Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3350435Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3350469Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3350509Z File "", line 1, in 2025-09-07T07:34:42.3350688Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3350766Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3350811Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3350946Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3350996Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3351033Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3351227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3351269Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3351304Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3352415Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3352460Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3352496Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3352640Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3352701Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3352738Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3352872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3352960Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3353007Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3353137Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3353196Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3353238Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3353363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3353401Z leaves = list(leaves) 2025-09-07T07:34:42.3353435Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3353560Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3353595Z return func(x) 2025-09-07T07:34:42.3353627Z ^^^^^^^ 2025-09-07T07:34:42.3353765Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3353829Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3353869Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3354036Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3355008Z return func(*args, **kwargs) 2025-09-07T07:34:42.3355043Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3355224Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3355311Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3355331Z 2025-09-07T07:34:42.3355538Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3355541Z 2025-09-07T07:34:42.3355542Z 2025-09-07T07:34:42.3355615Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3355828Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.3355830Z 2025-09-07T07:34:42.3355915Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3355989Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3356023Z inline_call [] 2025-09-07T07:34:42.3356077Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3356113Z inductor [] 2025-09-07T07:34:42.3356187Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3356258Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3356574Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3356688Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3356741Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3356892Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3356978Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3357109Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3357257Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3358280Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3358316Z inline_call [] 2025-09-07T07:34:42.3358368Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3358441Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3358513Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3358770Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3358882Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3358931Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3359083Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3359169Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3359301Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3359423Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3359474Z =================================== FAILURES =================================== 2025-09-07T07:34:42.3359573Z _ WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.3359615Z Traceback (most recent call last): 2025-09-07T07:34:42.3359752Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3359787Z self._run_test( 2025-09-07T07:34:42.3359901Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3359985Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3360024Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3360210Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3361195Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3361234Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3361422Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3361468Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3361506Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3361643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3361687Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3361725Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3361867Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3361948Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3361986Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3362138Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3362183Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3362335Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3362387Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3362427Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3362569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3362640Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3362678Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3362794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3362859Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3362902Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3363958Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3364023Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3364063Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3364206Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3364252Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3364289Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3364425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3364464Z return aot_autograd( 2025-09-07T07:34:42.3364498Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3364634Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3364705Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3364749Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3364909Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3364991Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3365038Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3365243Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3365285Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3365471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3365511Z fx_g = _create_graph( 2025-09-07T07:34:42.3365570Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3365736Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3365769Z fx_g = make_fx( 2025-09-07T07:34:42.3366802Z ^^^^^^^^ 2025-09-07T07:34:42.3366954Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3367002Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3367039Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3367186Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3367228Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3367264Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3367423Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3367463Z t = dispatch_trace( 2025-09-07T07:34:42.3367496Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3367609Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3367649Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3367684Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3367808Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3367886Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3367920Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3368081Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3368161Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3368201Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3368328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3368365Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3369341Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3369471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3369514Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3369551Z ^^^^^^^^^ 2025-09-07T07:34:42.3369685Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3369724Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3369759Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3369909Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3369958Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3369990Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3370148Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3370210Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3370254Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3370432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3370746Z outs_pair = fn(*args) 2025-09-07T07:34:42.3370854Z ^^^^^^^^^ 2025-09-07T07:34:42.3372692Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3372987Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3373142Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3373468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3373726Z outs_pair = fn(*args) 2025-09-07T07:34:42.3373825Z ^^^^^^^^^ 2025-09-07T07:34:42.3374063Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3374342Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3374482Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3374759Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3375064Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3375217Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3375482Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3375734Z outs_pair = fn(*args) 2025-09-07T07:34:42.3375844Z ^^^^^^^^^ 2025-09-07T07:34:42.3376194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3378230Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3378392Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3379053Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3379321Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3379440Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3381606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3381904Z return handle_torch_function( 2025-09-07T07:34:42.3382070Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3382356Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3382615Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3382777Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3383044Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3383301Z return func(*args, **kwargs) 2025-09-07T07:34:42.3383415Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3383607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3384953Z result = _engine_run_backward( 2025-09-07T07:34:42.3385067Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3385286Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3385599Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3385804Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3386018Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3386265Z return user_fn(self, *args) 2025-09-07T07:34:42.3386370Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3386656Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3386882Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3386992Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3388352Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3388595Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3388708Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3388905Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3389105Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3389206Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3389437Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3389692Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3389818Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3390034Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3390254Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3391444Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3391686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3391932Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3392049Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3392283Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3392546Z t = dispatch_trace( 2025-09-07T07:34:42.3392638Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3392808Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3392998Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3393108Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3393301Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3394521Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3394626Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3394851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3395127Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3395281Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3395483Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3395681Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3395779Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3395966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3396168Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3396272Z ^^^^^^^^^ 2025-09-07T07:34:42.3397632Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3397873Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3397988Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3398085Z File "", line 1, in 2025-09-07T07:34:42.3398305Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3398595Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3398751Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3398971Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3399189Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3399306Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3399608Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3401006Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3401122Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3401363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3401616Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3401729Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3401945Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3402166Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3402273Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3402474Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3402732Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3402898Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3404116Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3404339Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3404475Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3404715Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3404914Z leaves = list(leaves) 2025-09-07T07:34:42.3405011Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3405196Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3405389Z return func(x) 2025-09-07T07:34:42.3405473Z ^^^^^^^ 2025-09-07T07:34:42.3405662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3405900Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3407166Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3407418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3407660Z return func(*args, **kwargs) 2025-09-07T07:34:42.3407769Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3408018Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3408319Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3408441Z 2025-09-07T07:34:42.3408650Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3408895Z 2025-09-07T07:34:42.3408897Z 2025-09-07T07:34:42.3408970Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3409263Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.3409483Z 2025-09-07T07:34:42.3409571Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3409798Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3410991Z inline_call [] 2025-09-07T07:34:42.3411099Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3411222Z inductor [] 2025-09-07T07:34:42.3411344Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3411522Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3411944Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3412351Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3412551Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3412788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3413065Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3413317Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3414642Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3414870Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3415012Z inline_call [] 2025-09-07T07:34:42.3415117Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3415277Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3415453Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3415815Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3416244Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3416440Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3416733Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3417000Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3418297Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3418581Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3418805Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3418945Z inline_call [] 2025-09-07T07:34:42.3419046Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3419208Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3419383Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3419744Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3420147Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3420343Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3420576Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3421868Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3422116Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3422430Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3422798Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-047d8270e50b818e.xml - 2025-09-07T07:34:42.3423102Z =========================== short test summary info ============================ 2025-09-07T07:34:42.3423589Z FAILED [0.2339s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3424056Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3424176Z 2025-09-07T07:34:42.3424385Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3424627Z 2025-09-07T07:34:42.3424629Z 2025-09-07T07:34:42.3424702Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3424992Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.3425212Z 2025-09-07T07:34:42.3426355Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3426616Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.3426780Z ================== 1 failed, 245 deselected, 2 rerun in 1.10s ================== 2025-09-07T07:34:42.3426917Z Got exit code 1 2025-09-07T07:34:42.3427095Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-09-07T07:34:42.3427688Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.3428211Z import pkg_resources 2025-09-07T07:34:42.3428441Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-ad61b8683529ac7c.xml 2025-09-07T07:34:42.3428699Z ============================= test session starts ============================== 2025-09-07T07:34:42.3428907Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.3429091Z cachedir: .pytest_cache 2025-09-07T07:34:42.3430376Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.3430610Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.3430721Z configfile: pytest.ini 2025-09-07T07:34:42.3430946Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.3431217Z collecting ... collected 467 items / 44 deselected / 423 selected 2025-09-07T07:34:42.3431375Z stepcurrent: skipping 44 already run items. 2025-09-07T07:34:42.3431498Z Running 202 items in this shard 2025-09-07T07:34:42.3431567Z 2025-09-07T07:34:42.3431824Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_False [W907 07:21:57.845205667 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3432216Z [W907 07:21:57.292637145 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3432462Z [W907 07:21:57.346118205 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3432701Z [W907 07:21:59.105289938 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3434008Z [W907 07:21:59.117038275 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3434245Z [W907 07:21:59.133451051 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3434480Z [W907 07:21:59.147164139 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3434715Z [W907 07:21:59.166457634 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3434992Z [W907 07:21:59.170934088 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3435228Z [W907 07:21:59.171195784 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3435462Z [W907 07:21:59.171316862 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3435696Z [W907 07:21:59.171432191 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3435938Z [W907 07:21:59.171545029 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3436171Z [W907 07:21:59.171661276 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3436403Z [W907 07:21:59.171769176 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3437761Z [W907 07:21:59.171876534 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3437998Z [W907 07:21:59.171983142 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3438231Z [W907 07:21:59.172167539 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3438464Z [W907 07:21:59.172279327 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3438697Z [W907 07:21:59.172375177 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3438958Z [W907 07:21:59.172472215 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3439193Z [W907 07:21:59.172565034 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3439426Z [W907 07:21:59.172659392 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3439660Z [W907 07:21:59.172753361 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3439897Z [W907 07:21:59.172844819 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3440130Z [W907 07:21:59.172936728 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3441479Z [W907 07:21:59.173081896 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3441715Z [W907 07:21:59.173805785 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3441952Z [W907 07:21:59.173929643 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3442186Z [W907 07:21:59.174041112 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3442421Z [W907 07:21:59.174147120 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3442655Z [W907 07:21:59.174264289 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3442891Z [W907 07:21:59.174357807 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3443126Z [W907 07:21:59.174459276 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3443360Z [W907 07:21:59.174556104 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3443593Z [W907 07:21:59.174657142 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3443857Z [W907 07:21:59.174823670 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3445128Z [W907 07:21:59.174933568 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3445363Z [W907 07:21:59.175030957 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3445596Z [W907 07:21:59.175124736 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3445872Z [W907 07:21:59.175221084 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3446107Z [W907 07:21:59.175312323 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3446340Z [W907 07:21:59.175403022 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3446645Z [W907 07:21:59.175491290 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3446882Z [W907 07:21:59.175576999 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3447118Z [W907 07:21:59.176159460 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3447350Z [W907 07:21:59.176269199 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3448632Z [W907 07:21:59.176368747 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3448870Z [W907 07:21:59.176468576 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3449103Z [W907 07:21:59.176565215 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3449336Z [W907 07:21:59.176724422 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3449570Z [W907 07:21:59.176822981 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3449836Z [W907 07:21:59.176912189 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3450067Z [W907 07:21:59.177014138 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3450302Z [W907 07:21:59.177326633 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3450474Z PASSED [2.4402s] [ 0%] 2025-09-07T07:34:42.3450784Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True [W907 07:22:00.685022481 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3451161Z [W907 07:22:00.696054277 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3452393Z ('RERUN', {'yellow': True}) [0.9251s] [ 0%] 2025-09-07T07:34:42.3452723Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True [W907 07:22:01.630608397 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3453112Z [W907 07:22:01.642059037 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3453294Z ('RERUN', {'yellow': True}) [0.7220s] [ 0%] 2025-09-07T07:34:42.3453615Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True [W907 07:22:01.321276020 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3453993Z [W907 07:22:01.333048066 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3454165Z FAILED [0.6878s] [ 0%] 2025-09-07T07:34:42.3454224Z 2025-09-07T07:34:42.3454276Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.3454462Z _ WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.3454642Z Traceback (most recent call last): 2025-09-07T07:34:42.3454889Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3456160Z self._run_test( 2025-09-07T07:34:42.3456330Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3456599Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3456727Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3456992Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3457207Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3457322Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3457545Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3457777Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3457897Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3458104Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3458319Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3459470Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3459686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3459947Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3460100Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3460326Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3460560Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3460784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3461054Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3461180Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3461396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3461624Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3462756Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3462951Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3463169Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3463311Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3463518Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3463743Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3463883Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3464101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3464320Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3464432Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3464641Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3465852Z return aot_autograd( 2025-09-07T07:34:42.3465954Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3466151Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3466392Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3466617Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3466864Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3467173Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3467337Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3467602Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3467864Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3468165Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3469453Z fx_g = _create_graph( 2025-09-07T07:34:42.3469552Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3469773Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3470009Z fx_g = make_fx( 2025-09-07T07:34:42.3470093Z ^^^^^^^^ 2025-09-07T07:34:42.3470294Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3470527Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3470644Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3470862Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3471086Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3471196Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3472425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3472658Z t = dispatch_trace( 2025-09-07T07:34:42.3472750Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3472917Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3473137Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3473245Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3473439Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3473637Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3473740Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3473967Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3474241Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3475417Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3475618Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3475816Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3475915Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3476107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3476310Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3476416Z ^^^^^^^^^ 2025-09-07T07:34:42.3476679Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3476887Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3476991Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3477207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3478499Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3478616Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3478828Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3479083Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3479249Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3479501Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3479750Z outs_pair = fn(*args) 2025-09-07T07:34:42.3479846Z ^^^^^^^^^ 2025-09-07T07:34:42.3480076Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3480438Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3481619Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3481878Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3482125Z outs_pair = fn(*args) 2025-09-07T07:34:42.3482219Z ^^^^^^^^^ 2025-09-07T07:34:42.3482454Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3482727Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3482862Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3483134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3483437Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3483586Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3483838Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3485637Z outs_pair = fn(*args) 2025-09-07T07:34:42.3485780Z ^^^^^^^^^ 2025-09-07T07:34:42.3486031Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3486303Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3486418Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3486731Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3486982Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3487100Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3487296Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3487498Z return handle_torch_function( 2025-09-07T07:34:42.3487603Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3488826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3489087Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3489240Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3489490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3489735Z return func(*args, **kwargs) 2025-09-07T07:34:42.3489840Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3490033Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3490234Z result = _engine_run_backward( 2025-09-07T07:34:42.3490340Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3490554Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3490857Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3492061Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3492276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3492479Z return user_fn(self, *args) 2025-09-07T07:34:42.3492582Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3492791Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3493061Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3493172Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3493398Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3493637Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3493748Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3493943Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3495111Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3495216Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3495443Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3495695Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3495820Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3496036Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3496257Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3496378Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3496676Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3496952Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3497069Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3498278Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3498510Z t = dispatch_trace( 2025-09-07T07:34:42.3498603Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3498771Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3498961Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3499071Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3499263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3499460Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3499560Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3499782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3500060Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3501169Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3501369Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3501567Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3501666Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3501853Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3502055Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3502159Z ^^^^^^^^^ 2025-09-07T07:34:42.3502371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3502604Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3502722Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3502853Z File "", line 1, in 2025-09-07T07:34:42.3504033Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3504292Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3504448Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3504716Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3504936Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3505053Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3505316Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3505587Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3505700Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3505943Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3506192Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3507348Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3507566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3507787Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3507897Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3508098Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3508355Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3508520Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3508756Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3508980Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3509114Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3509318Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3510483Z leaves = list(leaves) 2025-09-07T07:34:42.3510582Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3510765Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3510959Z return func(x) 2025-09-07T07:34:42.3511043Z ^^^^^^^ 2025-09-07T07:34:42.3511231Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3511467Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3511606Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3511851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3512094Z return func(*args, **kwargs) 2025-09-07T07:34:42.3512197Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3513405Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3513708Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3513831Z 2025-09-07T07:34:42.3514041Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3514283Z 2025-09-07T07:34:42.3514285Z 2025-09-07T07:34:42.3514359Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3514651Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.3514899Z 2025-09-07T07:34:42.3514986Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3515181Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3515325Z inline_call [] 2025-09-07T07:34:42.3515429Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3515641Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3515820Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3517230Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3517637Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3517841Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3518078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3518349Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3518602Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3518889Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3519141Z _ WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.3519315Z Traceback (most recent call last): 2025-09-07T07:34:42.3519523Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3520750Z self._run_test( 2025-09-07T07:34:42.3520957Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3521159Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3521284Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3521490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3521702Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3521818Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3522043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3522273Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3522390Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3522596Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3523786Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3523903Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3524116Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3524372Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3524523Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3524751Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3524984Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3525207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3525445Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3525570Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3525787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3527088Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3527216Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3527406Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3527624Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3527767Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3528018Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3528244Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3528381Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3528601Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3528649Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3528685Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3528826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3528864Z return aot_autograd( 2025-09-07T07:34:42.3528900Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3530015Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3530088Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3530133Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3530299Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3530382Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3530458Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3530642Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3530686Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3530874Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3530914Z fx_g = _create_graph( 2025-09-07T07:34:42.3530951Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3531114Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3531148Z fx_g = make_fx( 2025-09-07T07:34:42.3531181Z ^^^^^^^^ 2025-09-07T07:34:42.3531334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3531383Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3531420Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3531568Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3531610Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3531647Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3531806Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3532790Z t = dispatch_trace( 2025-09-07T07:34:42.3532825Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3532940Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3532981Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3533017Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3533142Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3533210Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3533245Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3533408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3533489Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3533529Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3533684Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3533722Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3533757Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3533882Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3533924Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3533958Z ^^^^^^^^^ 2025-09-07T07:34:42.3534092Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3534131Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3534167Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3534315Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3534365Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3535568Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3535730Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3535792Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3535837Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3536014Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3536080Z outs_pair = fn(*args) 2025-09-07T07:34:42.3536113Z ^^^^^^^^^ 2025-09-07T07:34:42.3536287Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3536354Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3536398Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3536633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3536672Z outs_pair = fn(*args) 2025-09-07T07:34:42.3536705Z ^^^^^^^^^ 2025-09-07T07:34:42.3536888Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3536952Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3536995Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3537193Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3537263Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3537308Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3537485Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3537523Z outs_pair = fn(*args) 2025-09-07T07:34:42.3537556Z ^^^^^^^^^ 2025-09-07T07:34:42.3538705Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3538751Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3538821Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3538990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3539036Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3539072Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3539200Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3539275Z return handle_torch_function( 2025-09-07T07:34:42.3539312Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3539453Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3539527Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3539571Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3539745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3539787Z return func(*args, **kwargs) 2025-09-07T07:34:42.3539824Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3539948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3539990Z result = _engine_run_backward( 2025-09-07T07:34:42.3540024Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3540172Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3540293Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3540342Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3541413Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3541481Z return user_fn(self, *args) 2025-09-07T07:34:42.3541516Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3541662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3541704Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3541741Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3541901Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3541945Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3541981Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3542104Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3542143Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3542178Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3542348Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3542399Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3542439Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3542576Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3542626Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3542670Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3542833Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3542880Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3542919Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3543077Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3544076Z t = dispatch_trace( 2025-09-07T07:34:42.3544110Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3544227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3544272Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3544308Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3544432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3544505Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3544541Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3544704Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3544782Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3544822Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3544949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3544986Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3545020Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3545146Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3545188Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3545221Z ^^^^^^^^^ 2025-09-07T07:34:42.3545373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3545421Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3545455Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3545495Z File "", line 1, in 2025-09-07T07:34:42.3546641Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3546756Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3546802Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3546940Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3546988Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3547024Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3547220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3547262Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3547297Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3547468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3547514Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3547551Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3547694Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3547735Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3547771Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3547905Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3547995Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3548042Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3548168Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3548228Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3548271Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3548424Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3549411Z leaves = list(leaves) 2025-09-07T07:34:42.3549447Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3549572Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3549607Z return func(x) 2025-09-07T07:34:42.3549639Z ^^^^^^^ 2025-09-07T07:34:42.3549821Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3549887Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3549928Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3550094Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3550138Z return func(*args, **kwargs) 2025-09-07T07:34:42.3550176Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3550356Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3550440Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3550442Z 2025-09-07T07:34:42.3550649Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3550653Z 2025-09-07T07:34:42.3550655Z 2025-09-07T07:34:42.3550727Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3550912Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.3550915Z 2025-09-07T07:34:42.3550999Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3551094Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3551129Z inline_call [] 2025-09-07T07:34:42.3551184Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3551257Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3552272Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3552534Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3552649Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3552702Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3552854Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3552944Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3553076Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3553195Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3553266Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3553300Z inline_call [] 2025-09-07T07:34:42.3553355Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3553426Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3553496Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3553747Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3553881Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3553930Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3554081Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3554165Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3554327Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3554445Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3554495Z =================================== FAILURES =================================== 2025-09-07T07:34:42.3554591Z _ WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.3555576Z Traceback (most recent call last): 2025-09-07T07:34:42.3555716Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3555751Z self._run_test( 2025-09-07T07:34:42.3555865Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3555921Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3555960Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3556095Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3556140Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3556179Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3556330Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3556375Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3556432Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3556637Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3556681Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3556718Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3556860Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3556942Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3556981Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3557132Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3557176Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3557326Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3558329Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3558370Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3558514Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3558564Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3558603Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3558719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3558786Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3558829Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3558955Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3559017Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3559090Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3559230Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3559274Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3559310Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3559448Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3559487Z return aot_autograd( 2025-09-07T07:34:42.3559555Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3559693Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3559761Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3559806Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3559968Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3560054Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3561107Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3561293Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3561335Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3561524Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3561563Z fx_g = _create_graph( 2025-09-07T07:34:42.3561599Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3561762Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3561822Z fx_g = make_fx( 2025-09-07T07:34:42.3561857Z ^^^^^^^^ 2025-09-07T07:34:42.3562011Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3562056Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3562094Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3562240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3562283Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3562318Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3562477Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3562513Z t = dispatch_trace( 2025-09-07T07:34:42.3562547Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3562659Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3562703Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3562737Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3563810Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3563851Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3563886Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3564047Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3564128Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3564169Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3564293Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3564332Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3564366Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3564518Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3564558Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3564593Z ^^^^^^^^^ 2025-09-07T07:34:42.3564726Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3564765Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3564799Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3564982Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3565031Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3565065Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3565223Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3565285Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3565332Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3565508Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3566553Z outs_pair = fn(*args) 2025-09-07T07:34:42.3566590Z ^^^^^^^^^ 2025-09-07T07:34:42.3566764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3566832Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3566877Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3567053Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3567091Z outs_pair = fn(*args) 2025-09-07T07:34:42.3567154Z ^^^^^^^^^ 2025-09-07T07:34:42.3567332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3567391Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3567432Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3567626Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3567697Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3567741Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3567913Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3567950Z outs_pair = fn(*args) 2025-09-07T07:34:42.3568016Z ^^^^^^^^^ 2025-09-07T07:34:42.3568210Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3568254Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3568289Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3568459Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3568504Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3569489Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3569615Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3569657Z return handle_torch_function( 2025-09-07T07:34:42.3569692Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3569834Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3569939Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3569983Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3570150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3570191Z return func(*args, **kwargs) 2025-09-07T07:34:42.3570225Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3570382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3570423Z result = _engine_run_backward( 2025-09-07T07:34:42.3570459Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3570603Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3570724Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3570776Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3570902Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3570943Z return user_fn(self, *args) 2025-09-07T07:34:42.3570978Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3571123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3571167Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3572145Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3572304Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3572346Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3572382Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3572523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3572564Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3572599Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3572764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3572815Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3572853Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3572992Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3573040Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3573079Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3573239Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3573290Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3573328Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3573487Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3573524Z t = dispatch_trace( 2025-09-07T07:34:42.3573558Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3573671Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3573714Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3573749Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3574806Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3574844Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3574880Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3575041Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3575143Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3575183Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3575309Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3575347Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3575380Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3575540Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3575581Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3575615Z ^^^^^^^^^ 2025-09-07T07:34:42.3575764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3575813Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3575847Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3575889Z File "", line 1, in 2025-09-07T07:34:42.3576036Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3576113Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3576156Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3576292Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3576339Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3577385Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3577578Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3577621Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3577685Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3577862Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3577905Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3577942Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3578084Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3578126Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3578165Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3578300Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3578387Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3578433Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3578559Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3578620Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3578663Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3578787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3578826Z leaves = list(leaves) 2025-09-07T07:34:42.3578859Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3578984Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3579018Z return func(x) 2025-09-07T07:34:42.3579051Z ^^^^^^^ 2025-09-07T07:34:42.3580131Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3580197Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3580238Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3580435Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3580475Z return func(*args, **kwargs) 2025-09-07T07:34:42.3580511Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3580692Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3580777Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3580816Z 2025-09-07T07:34:42.3581027Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3581030Z 2025-09-07T07:34:42.3581031Z 2025-09-07T07:34:42.3581103Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3581288Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.3581294Z 2025-09-07T07:34:42.3581380Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3581453Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3581488Z inline_call [] 2025-09-07T07:34:42.3581541Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3581616Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3581688Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3581948Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3582064Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3582133Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3582282Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3583312Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3583444Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3583566Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3583637Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3583672Z inline_call [] 2025-09-07T07:34:42.3583724Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3583795Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3583865Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3584127Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3584239Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3584289Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3584440Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3584524Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3584653Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3584772Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3584841Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3584895Z inline_call [] 2025-09-07T07:34:42.3584947Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3585018Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3585087Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3585366Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3585477Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3586468Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3586691Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3586777Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3586908Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3587025Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3587244Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-ad61b8683529ac7c.xml - 2025-09-07T07:34:42.3587304Z =========================== short test summary info ============================ 2025-09-07T07:34:42.3587656Z FAILED [0.6878s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3587741Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3587772Z 2025-09-07T07:34:42.3587978Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3587981Z 2025-09-07T07:34:42.3587982Z 2025-09-07T07:34:42.3588054Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3588236Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.3588241Z 2025-09-07T07:34:42.3588326Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3588385Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.3588455Z ============= 1 failed, 1 passed, 44 deselected, 2 rerun in 4.95s ============== 2025-09-07T07:34:42.3588490Z Got exit code 1 2025-09-07T07:34:42.3588529Z Retrying single test... 2025-09-07T07:34:42.3588957Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.3588995Z import pkg_resources 2025-09-07T07:34:42.3589165Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-053c64d363e8fcee.xml 2025-09-07T07:34:42.3589223Z ============================= test session starts ============================== 2025-09-07T07:34:42.3590289Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.3590329Z cachedir: .pytest_cache 2025-09-07T07:34:42.3590486Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.3590562Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.3590599Z configfile: pytest.ini 2025-09-07T07:34:42.3590760Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.3590836Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.3591057Z stepcurrent: skipping 45 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.3591134Z Running 1 items in this shard 2025-09-07T07:34:42.3591137Z 2025-09-07T07:34:42.3591389Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True [W907 07:22:08.163667842 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3591495Z [W907 07:22:09.619529655 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3591600Z [W907 07:22:09.637285903 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3591646Z ('RERUN', {'yellow': True}) [0.8788s] [100%] 2025-09-07T07:34:42.3591891Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True [W907 07:22:09.495106506 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3591991Z [W907 07:22:09.509069530 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3592039Z ('RERUN', {'yellow': True}) [0.7551s] [100%] 2025-09-07T07:34:42.3592282Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True [W907 07:22:10.183868737 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3592383Z [W907 07:22:10.196060677 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3592437Z FAILED [0.7122s] [100%] 2025-09-07T07:34:42.3592439Z 2025-09-07T07:34:42.3592489Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.3592584Z _ WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.3593573Z Traceback (most recent call last): 2025-09-07T07:34:42.3593715Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3593750Z self._run_test( 2025-09-07T07:34:42.3593867Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3593922Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3593961Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3594096Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3594143Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3594184Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3594337Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3594383Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3594421Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3594557Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3594602Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3594639Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3594782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3594864Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3594901Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3595082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3595128Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3595277Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3595330Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3596306Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3596546Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3596598Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3596637Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3596753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3596819Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3596866Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3596993Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3597055Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3597096Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3597235Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3597281Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3597317Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3597456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3597494Z return aot_autograd( 2025-09-07T07:34:42.3597529Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3597693Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3597764Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3597809Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3597970Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3598052Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3599052Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3599237Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3599280Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3599466Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3599510Z fx_g = _create_graph( 2025-09-07T07:34:42.3599544Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3599709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3599742Z fx_g = make_fx( 2025-09-07T07:34:42.3599775Z ^^^^^^^^ 2025-09-07T07:34:42.3599927Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3599975Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3600013Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3600362Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3600406Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3600441Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3600602Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3600668Z t = dispatch_trace( 2025-09-07T07:34:42.3600702Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3600814Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3600855Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3600890Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3601053Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3602048Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3602085Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3602247Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3602325Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3602369Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3602494Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3602531Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3602566Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3602692Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3602734Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3602769Z ^^^^^^^^^ 2025-09-07T07:34:42.3602905Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3602944Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3602978Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3603129Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3603199Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3603233Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3603389Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3603452Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3603494Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3603672Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3603710Z outs_pair = fn(*args) 2025-09-07T07:34:42.3604681Z ^^^^^^^^^ 2025-09-07T07:34:42.3604855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3604923Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3604971Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3605145Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3605183Z outs_pair = fn(*args) 2025-09-07T07:34:42.3605217Z ^^^^^^^^^ 2025-09-07T07:34:42.3605393Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3605454Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3605496Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3605691Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3605761Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3605807Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3605998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3606036Z outs_pair = fn(*args) 2025-09-07T07:34:42.3606069Z ^^^^^^^^^ 2025-09-07T07:34:42.3606261Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3606330Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3606366Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3606583Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3606629Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3607609Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3607739Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3607783Z return handle_torch_function( 2025-09-07T07:34:42.3607819Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3607959Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3608034Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3608079Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3608247Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3608288Z return func(*args, **kwargs) 2025-09-07T07:34:42.3608323Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3608446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3608515Z result = _engine_run_backward( 2025-09-07T07:34:42.3608552Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3608697Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3608821Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3608870Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3608998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3609039Z return user_fn(self, *args) 2025-09-07T07:34:42.3609075Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3609218Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3609262Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3609298Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3610395Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3610439Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3610476Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3610599Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3610638Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3610672Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3610840Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3610890Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3610930Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3611065Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3611141Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3611179Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3611341Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3611387Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3611425Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3611626Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3611664Z t = dispatch_trace( 2025-09-07T07:34:42.3611698Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3611812Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3611855Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3611889Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3612961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3612999Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3613034Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3613194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3613273Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3613312Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3613439Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3613476Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3613510Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3613636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3613704Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3613738Z ^^^^^^^^^ 2025-09-07T07:34:42.3613888Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3613936Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3613969Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3614010Z File "", line 1, in 2025-09-07T07:34:42.3614152Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3614232Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3614276Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3614412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3614458Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3614496Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3615627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3615671Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3615706Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3615881Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3615925Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3615962Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3616104Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3616147Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3616182Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3616317Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3616427Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3616473Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3616656Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3616716Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3616799Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3616928Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3616965Z leaves = list(leaves) 2025-09-07T07:34:42.3616999Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3617121Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3617157Z return func(x) 2025-09-07T07:34:42.3617192Z ^^^^^^^ 2025-09-07T07:34:42.3618281Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3618346Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3618388Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3618554Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3618597Z return func(*args, **kwargs) 2025-09-07T07:34:42.3618632Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3618816Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3618901Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3618903Z 2025-09-07T07:34:42.3619110Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3619142Z 2025-09-07T07:34:42.3619144Z 2025-09-07T07:34:42.3619216Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3619400Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.3619402Z 2025-09-07T07:34:42.3619488Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3619563Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3619597Z inline_call [] 2025-09-07T07:34:42.3619651Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3619685Z inductor [] 2025-09-07T07:34:42.3619758Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3619832Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3620092Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3620208Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3620258Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3620411Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3621440Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3621573Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3621693Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3621818Z _ WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.3621862Z Traceback (most recent call last): 2025-09-07T07:34:42.3622000Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3622034Z self._run_test( 2025-09-07T07:34:42.3622146Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3622200Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3622276Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3622407Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3622453Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3622490Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3622640Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3622687Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3622726Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3622861Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3622907Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3622943Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3623087Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3623166Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3624141Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3624295Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3624339Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3624510Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3624563Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3624602Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3624745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3624794Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3624835Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3624951Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3625016Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3625059Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3625185Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3625250Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3625291Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3625430Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3625473Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3625510Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3625650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3625689Z return aot_autograd( 2025-09-07T07:34:42.3625723Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3625858Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3626945Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3626995Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3627183Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3627266Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3627310Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3627532Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3627575Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3627761Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3627799Z fx_g = _create_graph( 2025-09-07T07:34:42.3627834Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3628000Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3628037Z fx_g = make_fx( 2025-09-07T07:34:42.3628069Z ^^^^^^^^ 2025-09-07T07:34:42.3628220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3628265Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3628303Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3628451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3628493Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3628527Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3628687Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3628724Z t = dispatch_trace( 2025-09-07T07:34:42.3629724Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3629839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3629880Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3629914Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3630038Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3630078Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3630112Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3630277Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3630354Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3630394Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3630519Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3630562Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3630596Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3630722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3630762Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3630797Z ^^^^^^^^^ 2025-09-07T07:34:42.3630930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3630974Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3631008Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3631158Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3631206Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3631239Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3632324Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3632413Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3632456Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3632632Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3632670Z outs_pair = fn(*args) 2025-09-07T07:34:42.3632704Z ^^^^^^^^^ 2025-09-07T07:34:42.3632915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3632981Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3633026Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3633198Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3633238Z outs_pair = fn(*args) 2025-09-07T07:34:42.3633271Z ^^^^^^^^^ 2025-09-07T07:34:42.3633449Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3633508Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3633550Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3633746Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3633816Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3633861Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3634034Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3634089Z outs_pair = fn(*args) 2025-09-07T07:34:42.3634123Z ^^^^^^^^^ 2025-09-07T07:34:42.3634312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3635301Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3635337Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3635509Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3635553Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3635591Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3635716Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3635758Z return handle_torch_function( 2025-09-07T07:34:42.3635795Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3635941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3636015Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3636059Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3636226Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3636267Z return func(*args, **kwargs) 2025-09-07T07:34:42.3636303Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3636427Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3636468Z result = _engine_run_backward( 2025-09-07T07:34:42.3636572Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3636721Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3636873Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3636922Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3637049Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3638044Z return user_fn(self, *args) 2025-09-07T07:34:42.3638081Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3638267Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3638310Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3638346Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3638504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3638550Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3638588Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3638711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3638749Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3638784Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3638948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3639001Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3639041Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3639177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3639225Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3639264Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3639447Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3639496Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3639534Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3639696Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3639733Z t = dispatch_trace( 2025-09-07T07:34:42.3640769Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3640886Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3640929Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3640963Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3641089Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3641126Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3641163Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3641325Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3641403Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3641443Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3641566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3641605Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3641639Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3641765Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3641806Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3641840Z ^^^^^^^^^ 2025-09-07T07:34:42.3641988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3642058Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3642091Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3642132Z File "", line 1, in 2025-09-07T07:34:42.3642274Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3643293Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3643338Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3643509Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3643557Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3643595Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3643787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3643833Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3643867Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3644039Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3644083Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3644119Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3644264Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3644306Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3644341Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3644474Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3644561Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3644624Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3644751Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3644810Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3644853Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3644978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3645016Z leaves = list(leaves) 2025-09-07T07:34:42.3645986Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3646110Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3646144Z return func(x) 2025-09-07T07:34:42.3646177Z ^^^^^^^ 2025-09-07T07:34:42.3646314Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3646381Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3646421Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3646663Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3646703Z return func(*args, **kwargs) 2025-09-07T07:34:42.3646738Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3646922Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3647007Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3647009Z 2025-09-07T07:34:42.3647217Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3647220Z 2025-09-07T07:34:42.3647224Z 2025-09-07T07:34:42.3647321Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3647505Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.3647508Z 2025-09-07T07:34:42.3647593Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3647667Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3647701Z inline_call [] 2025-09-07T07:34:42.3647790Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3647825Z inductor [] 2025-09-07T07:34:42.3647898Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3648916Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3649174Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3649293Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3649343Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3649494Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3649579Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3649713Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3649834Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3649905Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3649938Z inline_call [] 2025-09-07T07:34:42.3649992Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3650089Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3650160Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3650415Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3650528Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3650578Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3650728Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3650813Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3650941Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3651061Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3651110Z =================================== FAILURES =================================== 2025-09-07T07:34:42.3651207Z _ WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.3652185Z Traceback (most recent call last): 2025-09-07T07:34:42.3652323Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3652360Z self._run_test( 2025-09-07T07:34:42.3652471Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3652525Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3652565Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3652697Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3652763Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3652802Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3652953Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3652998Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3653037Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3653204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3653249Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3653285Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3653427Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3653506Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3653546Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3653696Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3653740Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3653889Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3654880Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3654923Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3655065Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3655115Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3655153Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3655268Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3655356Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3655399Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3655524Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3655587Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3655630Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3655774Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3655818Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3655855Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3655991Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3656030Z return aot_autograd( 2025-09-07T07:34:42.3656066Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3656203Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3656270Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3656314Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3656475Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3656616Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3657608Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3657792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3657833Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3658022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3658088Z fx_g = _create_graph( 2025-09-07T07:34:42.3658123Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3658286Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3658320Z fx_g = make_fx( 2025-09-07T07:34:42.3658352Z ^^^^^^^^ 2025-09-07T07:34:42.3658539Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3658584Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3658622Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3658768Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3658810Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3658848Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3659008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3659045Z t = dispatch_trace( 2025-09-07T07:34:42.3659078Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3659190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3659230Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3659265Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3659391Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3660366Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3660400Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3660563Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3660663Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3660706Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3660829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3660867Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3660901Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3661027Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3661069Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3661104Z ^^^^^^^^^ 2025-09-07T07:34:42.3661235Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3661275Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3661309Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3661459Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3661511Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3661545Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3661701Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3661763Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3661807Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3661984Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3662957Z outs_pair = fn(*args) 2025-09-07T07:34:42.3662993Z ^^^^^^^^^ 2025-09-07T07:34:42.3663164Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3663233Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3663297Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3663470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3663508Z outs_pair = fn(*args) 2025-09-07T07:34:42.3663541Z ^^^^^^^^^ 2025-09-07T07:34:42.3663744Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3663803Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3663845Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3664038Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3664107Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3664155Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3664328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3664366Z outs_pair = fn(*args) 2025-09-07T07:34:42.3664400Z ^^^^^^^^^ 2025-09-07T07:34:42.3664589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3664633Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3664668Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3664839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3664884Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3665873Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3666002Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3666044Z return handle_torch_function( 2025-09-07T07:34:42.3666079Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3666220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3666293Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3666340Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3666573Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3666614Z return func(*args, **kwargs) 2025-09-07T07:34:42.3666649Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3666773Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3666818Z result = _engine_run_backward( 2025-09-07T07:34:42.3666854Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3667000Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3667120Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3667169Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3667297Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3667338Z return user_fn(self, *args) 2025-09-07T07:34:42.3667373Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3667517Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3667560Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3668567Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3668729Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3668772Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3668807Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3668930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3669008Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3669044Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3669210Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3669261Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3669300Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3669437Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3669488Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3669526Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3669686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3669734Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3669771Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3669934Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3669971Z t = dispatch_trace( 2025-09-07T07:34:42.3670004Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3670117Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3670157Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3670216Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3671279Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3671318Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3671352Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3671513Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3671592Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3671634Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3671758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3671796Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3671829Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3671954Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3671997Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3672031Z ^^^^^^^^^ 2025-09-07T07:34:42.3672180Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3672228Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3672260Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3672302Z File "", line 1, in 2025-09-07T07:34:42.3672447Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3672524Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3672568Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3672704Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3672750Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3673742Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3673934Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3673977Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3674011Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3674217Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3674261Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3674297Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3674440Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3674482Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3674516Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3674655Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3674742Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3674787Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3674912Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3674970Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3675015Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3675140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3675178Z leaves = list(leaves) 2025-09-07T07:34:42.3675211Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3675334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3675386Z return func(x) 2025-09-07T07:34:42.3675419Z ^^^^^^^ 2025-09-07T07:34:42.3676550Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3676616Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3676657Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3676827Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3676867Z return func(*args, **kwargs) 2025-09-07T07:34:42.3676903Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3677085Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3677169Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3677174Z 2025-09-07T07:34:42.3677382Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3677384Z 2025-09-07T07:34:42.3677386Z 2025-09-07T07:34:42.3677457Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3677640Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.3677642Z 2025-09-07T07:34:42.3677730Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3677803Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3677838Z inline_call [] 2025-09-07T07:34:42.3677891Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3677924Z inductor [] 2025-09-07T07:34:42.3677997Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3678100Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3678356Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3678471Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3678521Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3679668Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3679755Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3679886Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3680005Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3680079Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3680112Z inline_call [] 2025-09-07T07:34:42.3680215Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3680287Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3680357Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3680613Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3680725Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3680774Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3680924Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3681033Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3681163Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3681280Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3681349Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3681383Z inline_call [] 2025-09-07T07:34:42.3681436Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3681506Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3681575Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3681827Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3682886Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3682936Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3683084Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3683166Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3683296Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3683413Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3683628Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-053c64d363e8fcee.xml - 2025-09-07T07:34:42.3683688Z =========================== short test summary info ============================ 2025-09-07T07:34:42.3684059Z FAILED [0.7122s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3684142Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3684145Z 2025-09-07T07:34:42.3684384Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3684386Z 2025-09-07T07:34:42.3684388Z 2025-09-07T07:34:42.3684459Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3684641Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.3684645Z 2025-09-07T07:34:42.3684729Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3684788Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.3684853Z ================== 1 failed, 245 deselected, 2 rerun in 2.52s ================== 2025-09-07T07:34:42.3684888Z Got exit code 1 2025-09-07T07:34:42.3684925Z Retrying single test... 2025-09-07T07:34:42.3685349Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.3685387Z import pkg_resources 2025-09-07T07:34:42.3685556Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-4f169a2855d4d3cc.xml 2025-09-07T07:34:42.3686635Z ============================= test session starts ============================== 2025-09-07T07:34:42.3686750Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.3686788Z cachedir: .pytest_cache 2025-09-07T07:34:42.3686943Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.3686987Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.3687024Z configfile: pytest.ini 2025-09-07T07:34:42.3687188Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.3687265Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.3687483Z stepcurrent: skipping 45 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.3687530Z Running 1 items in this shard 2025-09-07T07:34:42.3687532Z 2025-09-07T07:34:42.3687782Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True [W907 07:22:17.950832633 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3687888Z [W907 07:22:17.484786253 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3687990Z [W907 07:22:17.502119386 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3688038Z ('RERUN', {'yellow': True}) [0.9791s] [100%] 2025-09-07T07:34:42.3688279Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True [W907 07:22:18.306425031 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3688380Z [W907 07:22:18.338027493 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3688457Z ('RERUN', {'yellow': True}) [0.7744s] [100%] 2025-09-07T07:34:42.3688698Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True [W907 07:22:19.111159759 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3688796Z [W907 07:22:19.133062795 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3688833Z FAILED [0.7614s] [100%] 2025-09-07T07:34:42.3688835Z 2025-09-07T07:34:42.3688919Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.3689975Z _ WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.3690019Z Traceback (most recent call last): 2025-09-07T07:34:42.3690160Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3690194Z self._run_test( 2025-09-07T07:34:42.3690312Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3690367Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3690408Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3690542Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3690587Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3690624Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3690778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3690824Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3690862Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3690998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3691073Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3691112Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3691253Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3691335Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3691372Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3691525Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3691570Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3691720Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3692714Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3692755Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3692899Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3692951Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3692988Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3693104Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3693169Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3693213Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3693340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3693403Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3693443Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3693583Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3693649Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3693686Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3693823Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3693863Z return aot_autograd( 2025-09-07T07:34:42.3693897Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3694032Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3694130Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3694175Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3694336Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3695353Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3695402Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3695585Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3695627Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3695813Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3695851Z fx_g = _create_graph( 2025-09-07T07:34:42.3695888Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3696053Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3696086Z fx_g = make_fx( 2025-09-07T07:34:42.3696118Z ^^^^^^^^ 2025-09-07T07:34:42.3696271Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3696338Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3696375Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3696586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3696628Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3696664Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3696825Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3696862Z t = dispatch_trace( 2025-09-07T07:34:42.3696895Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3697007Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3697047Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3697082Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3698153Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3698195Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3698230Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3698392Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3698469Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3698510Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3698636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3698673Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3698707Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3698832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3698873Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3698908Z ^^^^^^^^^ 2025-09-07T07:34:42.3699070Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3699109Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3699144Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3699294Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3699343Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3699417Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3699575Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3699635Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3699679Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3699854Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3700837Z outs_pair = fn(*args) 2025-09-07T07:34:42.3700871Z ^^^^^^^^^ 2025-09-07T07:34:42.3701043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3701109Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3701153Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3701330Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3701369Z outs_pair = fn(*args) 2025-09-07T07:34:42.3701402Z ^^^^^^^^^ 2025-09-07T07:34:42.3701579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3701662Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3701707Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3701901Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3701971Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3702016Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3702191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3702228Z outs_pair = fn(*args) 2025-09-07T07:34:42.3702260Z ^^^^^^^^^ 2025-09-07T07:34:42.3702451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3702495Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3702531Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3702699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3703677Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3703715Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3703842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3703884Z return handle_torch_function( 2025-09-07T07:34:42.3703921Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3704061Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3704135Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3704179Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3704369Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3704409Z return func(*args, **kwargs) 2025-09-07T07:34:42.3704444Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3704567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3704608Z result = _engine_run_backward( 2025-09-07T07:34:42.3704643Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3704820Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3704941Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3704990Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3705116Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3705160Z return user_fn(self, *args) 2025-09-07T07:34:42.3705195Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3705343Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3705385Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3706356Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3706591Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3706635Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3706670Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3706794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3706833Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3706898Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3707067Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3707118Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3707158Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3707293Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3707342Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3707381Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3707543Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3707590Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3707629Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3707786Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3707827Z t = dispatch_trace( 2025-09-07T07:34:42.3707860Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3707971Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3708012Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3708996Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3709124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3709162Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3709196Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3709358Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3709435Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3709478Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3709631Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3709669Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3709702Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3709826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3709868Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3709900Z ^^^^^^^^^ 2025-09-07T07:34:42.3710096Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3710144Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3710178Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3710218Z File "", line 1, in 2025-09-07T07:34:42.3710363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3710443Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3710488Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3710622Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3710669Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3711652Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3711847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3711889Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3711924Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3712096Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3712162Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3712200Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3712344Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3712385Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3712421Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3712556Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3712645Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3712690Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3712815Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3712873Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3712919Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3713046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3713084Z leaves = list(leaves) 2025-09-07T07:34:42.3713118Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3713239Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3713274Z return func(x) 2025-09-07T07:34:42.3713305Z ^^^^^^^ 2025-09-07T07:34:42.3714376Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3714441Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3714482Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3714648Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3714692Z return func(*args, **kwargs) 2025-09-07T07:34:42.3714749Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3714929Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3715013Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3715015Z 2025-09-07T07:34:42.3715253Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3715256Z 2025-09-07T07:34:42.3715258Z 2025-09-07T07:34:42.3715328Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3715512Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.3715514Z 2025-09-07T07:34:42.3715599Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3715675Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3715709Z inline_call [] 2025-09-07T07:34:42.3715764Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3715797Z inductor [] 2025-09-07T07:34:42.3715871Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3715941Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3716204Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3716319Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3716370Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3717521Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3717646Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3717776Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3717894Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3717990Z _ WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.3718036Z Traceback (most recent call last): 2025-09-07T07:34:42.3718171Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3718206Z self._run_test( 2025-09-07T07:34:42.3718318Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3718373Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3718416Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3718549Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3718593Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3718632Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3718782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3718828Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3718867Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3719004Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3719046Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3719082Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3719225Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3720353Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3720393Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3720546Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3720589Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3720779Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3720833Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3720872Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3721014Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3721064Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3721105Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3721224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3721289Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3721331Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3721458Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3721521Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3721562Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3721701Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3721745Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3721781Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3721941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3721980Z return aot_autograd( 2025-09-07T07:34:42.3722015Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3723089Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3723160Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3723204Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3723367Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3723449Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3723494Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3723675Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3723722Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3723907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3723946Z fx_g = _create_graph( 2025-09-07T07:34:42.3723980Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3724146Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3724179Z fx_g = make_fx( 2025-09-07T07:34:42.3724210Z ^^^^^^^^ 2025-09-07T07:34:42.3724363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3724408Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3724446Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3724595Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3724655Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3724691Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3724849Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3724885Z t = dispatch_trace( 2025-09-07T07:34:42.3725853Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3725996Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3726039Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3726073Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3726197Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3726236Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3726270Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3726434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3726590Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3726630Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3726754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3726792Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3726828Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3726953Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3726994Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3727027Z ^^^^^^^^^ 2025-09-07T07:34:42.3727158Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3727222Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3727257Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3727408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3727455Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3728434Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3728591Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3728655Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3728698Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3728873Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3728911Z outs_pair = fn(*args) 2025-09-07T07:34:42.3728945Z ^^^^^^^^^ 2025-09-07T07:34:42.3729119Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3729187Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3729229Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3729403Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3729440Z outs_pair = fn(*args) 2025-09-07T07:34:42.3729476Z ^^^^^^^^^ 2025-09-07T07:34:42.3729651Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3729709Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3729751Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3729945Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3730043Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3730088Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3730259Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3730296Z outs_pair = fn(*args) 2025-09-07T07:34:42.3730374Z ^^^^^^^^^ 2025-09-07T07:34:42.3731509Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3731554Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3731590Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3731759Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3731807Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3731843Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3731967Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3732009Z return handle_torch_function( 2025-09-07T07:34:42.3732044Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3732187Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3732260Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3732305Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3732470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3732531Z return func(*args, **kwargs) 2025-09-07T07:34:42.3732568Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3732691Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3732731Z result = _engine_run_backward( 2025-09-07T07:34:42.3732766Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3732911Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3733032Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3733080Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3734146Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3734187Z return user_fn(self, *args) 2025-09-07T07:34:42.3734222Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3734371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3734414Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3734449Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3734605Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3734648Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3734684Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3734808Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3734847Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3734881Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3735047Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3735097Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3735155Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3735291Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3735339Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3735378Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3735569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3735616Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3735654Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3735812Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3736849Z t = dispatch_trace( 2025-09-07T07:34:42.3736884Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3736997Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3737042Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3737077Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3737201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3737238Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3737273Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3737437Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3737515Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3737554Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3737677Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3737739Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3737774Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3737900Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3737941Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3737974Z ^^^^^^^^^ 2025-09-07T07:34:42.3738124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3738172Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3738207Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3738248Z File "", line 1, in 2025-09-07T07:34:42.3738391Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3739411Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3739456Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3739596Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3739642Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3739680Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3739870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3739913Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3739949Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3740120Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3740162Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3740198Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3740341Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3740412Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3740446Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3740580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3740666Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3740711Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3740876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3740936Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3740978Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3741104Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3742080Z leaves = list(leaves) 2025-09-07T07:34:42.3742118Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3742241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3742275Z return func(x) 2025-09-07T07:34:42.3742307Z ^^^^^^^ 2025-09-07T07:34:42.3742445Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3742509Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3742552Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3742718Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3742758Z return func(*args, **kwargs) 2025-09-07T07:34:42.3742793Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3742974Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3743081Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3743083Z 2025-09-07T07:34:42.3743292Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3743295Z 2025-09-07T07:34:42.3743297Z 2025-09-07T07:34:42.3743368Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3743555Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.3743557Z 2025-09-07T07:34:42.3743641Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3743715Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3743748Z inline_call [] 2025-09-07T07:34:42.3743803Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3743837Z inductor [] 2025-09-07T07:34:42.3743909Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3744915Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3745172Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3745289Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3745338Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3745490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3745575Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3745708Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3745848Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3745919Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3745952Z inline_call [] 2025-09-07T07:34:42.3746006Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3746076Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3746177Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3746433Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3746613Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3746665Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3746814Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3746897Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3747026Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3747145Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3747195Z =================================== FAILURES =================================== 2025-09-07T07:34:42.3748236Z _ WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.3748280Z Traceback (most recent call last): 2025-09-07T07:34:42.3748416Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3748479Z self._run_test( 2025-09-07T07:34:42.3748591Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3748646Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3748686Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3748817Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3748860Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3748901Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3749051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3749097Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3749134Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3749270Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3749316Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3749351Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3749497Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3749577Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3749615Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3749766Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3749811Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3749959Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3750946Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3750988Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3751166Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3751216Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3751255Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3751369Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3751435Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3751527Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3751654Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3751716Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3751757Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3751897Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3751942Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3751978Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3752118Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3752156Z return aot_autograd( 2025-09-07T07:34:42.3752191Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3752328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3752397Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3752442Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3752602Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3753640Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3753688Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3753868Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3753910Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3754097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3754137Z fx_g = _create_graph( 2025-09-07T07:34:42.3754172Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3754334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3754368Z fx_g = make_fx( 2025-09-07T07:34:42.3754399Z ^^^^^^^^ 2025-09-07T07:34:42.3754549Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3754598Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3754636Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3754783Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3754825Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3754861Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3755023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3755060Z t = dispatch_trace( 2025-09-07T07:34:42.3755094Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3755206Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3755247Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3755282Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3756360Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3756399Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3756434Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3756659Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3756737Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3756819Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3756944Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3756981Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3757015Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3757141Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3757183Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3757219Z ^^^^^^^^^ 2025-09-07T07:34:42.3757350Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3757390Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3757424Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3757572Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3757622Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3757656Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3757812Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3757873Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3757916Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3759055Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3759096Z outs_pair = fn(*args) 2025-09-07T07:34:42.3759130Z ^^^^^^^^^ 2025-09-07T07:34:42.3759303Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3759368Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3759414Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3759586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3759623Z outs_pair = fn(*args) 2025-09-07T07:34:42.3759657Z ^^^^^^^^^ 2025-09-07T07:34:42.3759834Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3759895Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3759936Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3760131Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3760257Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3760305Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3760476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3760515Z outs_pair = fn(*args) 2025-09-07T07:34:42.3760549Z ^^^^^^^^^ 2025-09-07T07:34:42.3760738Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3760811Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3760846Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3761015Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3762004Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3762040Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3762198Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3762240Z return handle_torch_function( 2025-09-07T07:34:42.3762275Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3762417Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3762492Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3762538Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3762706Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3762747Z return func(*args, **kwargs) 2025-09-07T07:34:42.3762780Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3762904Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3762944Z result = _engine_run_backward( 2025-09-07T07:34:42.3762982Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3763126Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3763246Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3763295Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3763439Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3763479Z return user_fn(self, *args) 2025-09-07T07:34:42.3763515Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3763658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3764635Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3764672Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3764832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3764874Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3764910Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3765032Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3765074Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3765109Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3765277Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3765327Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3765367Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3765503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3765553Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3765591Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3765751Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3765797Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3765835Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3766011Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3766047Z t = dispatch_trace( 2025-09-07T07:34:42.3766081Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3766193Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3766234Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3767282Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3767451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3767490Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3767524Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3767685Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3767763Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3767806Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3767929Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3767965Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3767999Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3768124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3768164Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3768199Z ^^^^^^^^^ 2025-09-07T07:34:42.3768350Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3768399Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3768431Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3768472Z File "", line 1, in 2025-09-07T07:34:42.3768615Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3768721Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3768765Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3768900Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3768946Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3769931Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3770125Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3770168Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3770202Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3770373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3770419Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3770456Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3770597Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3770639Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3770673Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3770810Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3770897Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3770942Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3771065Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3771124Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3771192Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3771320Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3771357Z leaves = list(leaves) 2025-09-07T07:34:42.3771392Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3771514Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3771548Z return func(x) 2025-09-07T07:34:42.3772513Z ^^^^^^^ 2025-09-07T07:34:42.3772684Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3772748Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3772788Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3772955Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3772998Z return func(*args, **kwargs) 2025-09-07T07:34:42.3773033Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3773213Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3773297Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3773299Z 2025-09-07T07:34:42.3773507Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3773511Z 2025-09-07T07:34:42.3773512Z 2025-09-07T07:34:42.3773584Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3773768Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.3773786Z 2025-09-07T07:34:42.3773872Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3773945Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3773978Z inline_call [] 2025-09-07T07:34:42.3774031Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3774065Z inductor [] 2025-09-07T07:34:42.3774137Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3774210Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3774469Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3774583Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3774633Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3775723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3775807Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3775938Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3776055Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3776127Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3776161Z inline_call [] 2025-09-07T07:34:42.3776215Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3776286Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3776354Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3776681Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3776824Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3776873Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3777023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3777140Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3777270Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3777387Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3777459Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3777494Z inline_call [] 2025-09-07T07:34:42.3777548Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3777619Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3777688Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3778892Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3779007Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3779055Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3779203Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3779286Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3779438Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3779555Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3779770Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-4f169a2855d4d3cc.xml - 2025-09-07T07:34:42.3779827Z =========================== short test summary info ============================ 2025-09-07T07:34:42.3780174Z FAILED [0.7614s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3780257Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3780259Z 2025-09-07T07:34:42.3780465Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3780469Z 2025-09-07T07:34:42.3780471Z 2025-09-07T07:34:42.3780541Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3780725Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.3780727Z 2025-09-07T07:34:42.3780814Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3780873Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.3780938Z ================== 1 failed, 245 deselected, 2 rerun in 2.69s ================== 2025-09-07T07:34:42.3780974Z Got exit code 1 2025-09-07T07:34:42.3781096Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-09-07T07:34:42.3781519Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.3781575Z import pkg_resources 2025-09-07T07:34:42.3782688Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-f54a8ca4b848ff8b.xml 2025-09-07T07:34:42.3782772Z ============================= test session starts ============================== 2025-09-07T07:34:42.3782885Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.3782923Z cachedir: .pytest_cache 2025-09-07T07:34:42.3783079Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.3783125Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.3783166Z configfile: pytest.ini 2025-09-07T07:34:42.3783326Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.3783401Z collecting ... collected 467 items / 46 deselected / 421 selected 2025-09-07T07:34:42.3783450Z stepcurrent: skipping 46 already run items. 2025-09-07T07:34:42.3783492Z Running 200 items in this shard 2025-09-07T07:34:42.3783494Z 2025-09-07T07:34:42.3783750Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True [W907 07:22:26.555392717 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3783856Z [W907 07:22:27.648633719 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3783958Z [W907 07:22:27.763416883 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3784022Z ('RERUN', {'yellow': True}) [0.6752s] [ 0%] 2025-09-07T07:34:42.3784268Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True [W907 07:22:27.097619864 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3784369Z [W907 07:22:27.097895520 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3784414Z ('RERUN', {'yellow': True}) [0.2485s] [ 0%] 2025-09-07T07:34:42.3784660Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True [W907 07:22:27.357428034 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3784760Z [W907 07:22:27.357724111 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3784796Z FAILED [0.2553s] [ 0%] 2025-09-07T07:34:42.3784798Z 2025-09-07T07:34:42.3784848Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.3785879Z _ WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.3785922Z Traceback (most recent call last): 2025-09-07T07:34:42.3786062Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3786095Z self._run_test( 2025-09-07T07:34:42.3786209Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3786266Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3786306Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3786439Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3786550Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3786588Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3786743Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3786814Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3786853Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3786988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3787032Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3787069Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3787254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3787335Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3787373Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3787524Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3787572Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3788670Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3788725Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3788765Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3788907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3788959Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3788997Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3789113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3789178Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3789222Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3789374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3789440Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3789480Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3789620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3789663Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3789700Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3789840Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3789879Z return aot_autograd( 2025-09-07T07:34:42.3789913Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3790050Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3790121Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3790167Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3790326Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3791347Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3791392Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3791578Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3791620Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3791806Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3791845Z fx_g = _create_graph( 2025-09-07T07:34:42.3791883Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3792066Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3792100Z fx_g = make_fx( 2025-09-07T07:34:42.3792132Z ^^^^^^^^ 2025-09-07T07:34:42.3792284Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3792328Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3792364Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3792542Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3792585Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3792621Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3792781Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3792820Z t = dispatch_trace( 2025-09-07T07:34:42.3792853Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3792965Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3793004Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3793974Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3794099Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3794139Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3794177Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3794338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3794416Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3794456Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3794580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3794639Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3794673Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3794801Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3794841Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3794875Z ^^^^^^^^^ 2025-09-07T07:34:42.3795007Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3795048Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3795082Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3795229Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3795279Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3795311Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3795467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3795529Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3795574Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3796740Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3796781Z outs_pair = fn(*args) 2025-09-07T07:34:42.3796816Z ^^^^^^^^^ 2025-09-07T07:34:42.3796989Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3797055Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3797099Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3797271Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3797339Z outs_pair = fn(*args) 2025-09-07T07:34:42.3797372Z ^^^^^^^^^ 2025-09-07T07:34:42.3797549Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3797608Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3797650Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3797878Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3797949Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3797993Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3798165Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3798207Z outs_pair = fn(*args) 2025-09-07T07:34:42.3798241Z ^^^^^^^^^ 2025-09-07T07:34:42.3798429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3798473Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3798509Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3799627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3799673Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3799710Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3799834Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3799902Z return handle_torch_function( 2025-09-07T07:34:42.3799939Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3800080Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3800204Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3800250Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3800418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3800459Z return func(*args, **kwargs) 2025-09-07T07:34:42.3800494Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3800617Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3800658Z result = _engine_run_backward( 2025-09-07T07:34:42.3800693Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3800839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3800963Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3801012Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3801137Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3801179Z return user_fn(self, *args) 2025-09-07T07:34:42.3801218Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3801363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3802354Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3802391Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3802548Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3802615Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3802650Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3802775Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3802813Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3802849Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3803047Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3803098Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3803137Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3803273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3803321Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3803361Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3803523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3803570Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3803608Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3803766Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3803804Z t = dispatch_trace( 2025-09-07T07:34:42.3803838Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3803951Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3803992Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3804968Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3805091Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3805151Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3805187Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3805346Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3805423Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3805464Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3805586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3805626Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3805659Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3805787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3805827Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3805861Z ^^^^^^^^^ 2025-09-07T07:34:42.3806009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3806061Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3806094Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3806136Z File "", line 1, in 2025-09-07T07:34:42.3806279Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3806356Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3806403Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3806592Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3807578Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3807616Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3807807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3807880Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3807915Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3808085Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3808130Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3808165Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3808357Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3808398Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3808434Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3808567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3808654Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3808701Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3808829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3808888Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3808931Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3809058Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3809097Z leaves = list(leaves) 2025-09-07T07:34:42.3809131Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3809254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3809288Z return func(x) 2025-09-07T07:34:42.3810261Z ^^^^^^^ 2025-09-07T07:34:42.3810399Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3810490Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3810531Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3810697Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3810738Z return func(*args, **kwargs) 2025-09-07T07:34:42.3810773Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3810954Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3811039Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3811042Z 2025-09-07T07:34:42.3811246Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3811252Z 2025-09-07T07:34:42.3811254Z 2025-09-07T07:34:42.3811326Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3811510Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.3811513Z 2025-09-07T07:34:42.3811597Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3811671Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3811707Z inline_call [] 2025-09-07T07:34:42.3811761Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3811795Z inductor [] 2025-09-07T07:34:42.3811868Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3811939Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3812198Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3812331Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3813315Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3813468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3813588Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3813720Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3813837Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3813936Z _ WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.3813979Z Traceback (most recent call last): 2025-09-07T07:34:42.3814117Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3814152Z self._run_test( 2025-09-07T07:34:42.3814263Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3814317Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3814356Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3814489Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3814534Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3814573Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3814722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3814768Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3814821Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3814960Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3815003Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3815040Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3815180Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3816196Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3816234Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3816387Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3816431Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3816642Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3816698Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3816738Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3816878Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3816928Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3816965Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3817083Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3817147Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3817190Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3817315Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3817377Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3817442Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3817582Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3817626Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3817662Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3817801Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3817839Z return aot_autograd( 2025-09-07T07:34:42.3818859Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3818996Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3819064Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3819108Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3819268Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3819353Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3819398Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3819578Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3819621Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3819807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3819847Z fx_g = _create_graph( 2025-09-07T07:34:42.3819881Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3820043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3820102Z fx_g = make_fx( 2025-09-07T07:34:42.3820134Z ^^^^^^^^ 2025-09-07T07:34:42.3820284Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3820330Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3820367Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3820513Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3820555Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3820591Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3820749Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3821726Z t = dispatch_trace( 2025-09-07T07:34:42.3821760Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3821875Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3821919Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3821953Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3822078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3822116Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3822151Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3822312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3822393Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3822433Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3822556Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3822593Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3822628Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3822775Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3822816Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3822850Z ^^^^^^^^^ 2025-09-07T07:34:42.3822984Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3823023Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3823058Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3823232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3824215Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3824248Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3824405Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3824465Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3824512Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3824688Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3824726Z outs_pair = fn(*args) 2025-09-07T07:34:42.3824760Z ^^^^^^^^^ 2025-09-07T07:34:42.3824931Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3825001Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3825044Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3825217Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3825255Z outs_pair = fn(*args) 2025-09-07T07:34:42.3825307Z ^^^^^^^^^ 2025-09-07T07:34:42.3825484Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3825544Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3825585Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3825780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3825849Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3825893Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3826065Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3826103Z outs_pair = fn(*args) 2025-09-07T07:34:42.3827153Z ^^^^^^^^^ 2025-09-07T07:34:42.3827347Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3827390Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3827426Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3827594Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3827640Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3827677Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3827805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3827846Z return handle_torch_function( 2025-09-07T07:34:42.3827882Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3828022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3828133Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3828176Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3828343Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3828383Z return func(*args, **kwargs) 2025-09-07T07:34:42.3828418Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3828585Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3828626Z result = _engine_run_backward( 2025-09-07T07:34:42.3828662Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3828806Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3828928Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3829917Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3830042Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3830084Z return user_fn(self, *args) 2025-09-07T07:34:42.3830119Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3830263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3830308Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3830343Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3830502Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3830545Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3830580Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3830729Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3830769Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3830804Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3830969Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3831019Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3831058Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3831195Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3831244Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3831281Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3831441Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3831490Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3831529Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3832616Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3832655Z t = dispatch_trace( 2025-09-07T07:34:42.3832688Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3832801Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3832844Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3832880Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3833002Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3833040Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3833074Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3833234Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3833336Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3833376Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3833499Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3833536Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3833569Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3833727Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3833769Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3833802Z ^^^^^^^^^ 2025-09-07T07:34:42.3833951Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3833999Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3834033Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3834074Z File "", line 1, in 2025-09-07T07:34:42.3835149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3835227Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3835272Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3835407Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3835456Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3835493Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3835685Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3835727Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3835781Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3835954Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3835998Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3836034Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3836177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3836219Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3836255Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3836389Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3836476Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3836586Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3836714Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3836774Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3836816Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3837876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3837914Z leaves = list(leaves) 2025-09-07T07:34:42.3837948Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3838071Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3838106Z return func(x) 2025-09-07T07:34:42.3838138Z ^^^^^^^ 2025-09-07T07:34:42.3838275Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3838338Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3838379Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3838573Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3838613Z return func(*args, **kwargs) 2025-09-07T07:34:42.3838648Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3838827Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3838911Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3838951Z 2025-09-07T07:34:42.3839158Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3839161Z 2025-09-07T07:34:42.3839162Z 2025-09-07T07:34:42.3839233Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3839421Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.3839427Z 2025-09-07T07:34:42.3839511Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3839583Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3839617Z inline_call [] 2025-09-07T07:34:42.3839672Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3839705Z inductor [] 2025-09-07T07:34:42.3840765Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3840838Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3841099Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3841212Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3841290Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3841439Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3841525Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3841655Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3841776Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3841845Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3841879Z inline_call [] 2025-09-07T07:34:42.3841932Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3842004Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3842075Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3842328Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3842439Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3842489Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3842638Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3842721Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3842850Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3842967Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3843974Z =================================== FAILURES =================================== 2025-09-07T07:34:42.3844075Z _ WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.3844117Z Traceback (most recent call last): 2025-09-07T07:34:42.3844253Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3844288Z self._run_test( 2025-09-07T07:34:42.3844440Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3844496Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3844536Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3844668Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3844713Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3844755Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3844909Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3844955Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3844992Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3845128Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3845171Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3845209Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3845352Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3845432Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3845469Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3845620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3845689Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3846833Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3846887Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3846928Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3847073Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3847124Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3847162Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3847278Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3847343Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3847390Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3847516Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3847577Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3847618Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3847757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3847802Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3847840Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3847978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3848016Z return aot_autograd( 2025-09-07T07:34:42.3848051Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3848186Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3848285Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3848330Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3849430Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3849513Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3849602Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3849787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3849830Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3850015Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3850057Z fx_g = _create_graph( 2025-09-07T07:34:42.3850091Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3850254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3850287Z fx_g = make_fx( 2025-09-07T07:34:42.3850319Z ^^^^^^^^ 2025-09-07T07:34:42.3850470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3850517Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3850554Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3850699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3850740Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3850777Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3850934Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3850997Z t = dispatch_trace( 2025-09-07T07:34:42.3851030Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3851141Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3851182Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3852150Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3852276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3852316Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3852352Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3852513Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3852592Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3852632Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3852759Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3852797Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3852831Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3852957Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3852998Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3853031Z ^^^^^^^^^ 2025-09-07T07:34:42.3853167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3853206Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3853241Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3853388Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3853438Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3853491Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3853647Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3853708Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3854681Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3854856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3854931Z outs_pair = fn(*args) 2025-09-07T07:34:42.3854965Z ^^^^^^^^^ 2025-09-07T07:34:42.3855137Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3855202Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3855244Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3855419Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3855456Z outs_pair = fn(*args) 2025-09-07T07:34:42.3855490Z ^^^^^^^^^ 2025-09-07T07:34:42.3855667Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3855726Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3855769Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3855964Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3856032Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3856077Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3856269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3856307Z outs_pair = fn(*args) 2025-09-07T07:34:42.3856340Z ^^^^^^^^^ 2025-09-07T07:34:42.3856589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3856633Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3856671Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3857780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3857827Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3857862Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3857988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3858032Z return handle_torch_function( 2025-09-07T07:34:42.3858068Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3858208Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3858283Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3858327Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3858496Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3858536Z return func(*args, **kwargs) 2025-09-07T07:34:42.3858571Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3858693Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3858734Z result = _engine_run_backward( 2025-09-07T07:34:42.3858771Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3858946Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3859069Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3859117Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3859243Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3859323Z return user_fn(self, *args) 2025-09-07T07:34:42.3859360Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3859504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3860489Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3860524Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3860682Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3860728Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3860764Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3860886Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3860925Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3860959Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3861125Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3861176Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3861215Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3861351Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3861424Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3861464Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3861624Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3861672Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3861711Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3861871Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3861911Z t = dispatch_trace( 2025-09-07T07:34:42.3861944Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3862055Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3863032Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3863069Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3863192Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3863232Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3863267Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3863426Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3863504Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3863544Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3863672Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3863709Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3863743Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3863868Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3863909Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3863945Z ^^^^^^^^^ 2025-09-07T07:34:42.3864114Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3864161Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3864195Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3864235Z File "", line 1, in 2025-09-07T07:34:42.3864378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3864485Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3864531Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3864668Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3865652Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3865690Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3865884Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3865928Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3865963Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3866134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3866178Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3866215Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3866359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3866401Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3866435Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3866631Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3866748Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3866793Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3866917Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3866977Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3867019Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3867146Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3867184Z leaves = list(leaves) 2025-09-07T07:34:42.3867218Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3867340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3868316Z return func(x) 2025-09-07T07:34:42.3868349Z ^^^^^^^ 2025-09-07T07:34:42.3868490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3868553Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3868594Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3868761Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3868802Z return func(*args, **kwargs) 2025-09-07T07:34:42.3868838Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3869019Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3869103Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3869105Z 2025-09-07T07:34:42.3869310Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3869338Z 2025-09-07T07:34:42.3869340Z 2025-09-07T07:34:42.3869411Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3869599Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.3869601Z 2025-09-07T07:34:42.3869685Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3869798Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3869832Z inline_call [] 2025-09-07T07:34:42.3869886Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3869919Z inductor [] 2025-09-07T07:34:42.3869992Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3870063Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3870324Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3870437Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3871422Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3871573Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3871661Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3871793Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3871913Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3871983Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3872038Z inline_call [] 2025-09-07T07:34:42.3872092Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3872163Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3872232Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3872488Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3872600Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3872650Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3872798Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3872882Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3873012Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3873129Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3873198Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3873231Z inline_call [] 2025-09-07T07:34:42.3873284Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3873355Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3874352Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3874603Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3874714Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3874784Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3874933Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3875017Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3875144Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3875292Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3875508Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-f54a8ca4b848ff8b.xml - 2025-09-07T07:34:42.3875565Z =========================== short test summary info ============================ 2025-09-07T07:34:42.3875917Z FAILED [0.2553s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3876004Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3876007Z 2025-09-07T07:34:42.3876211Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3876215Z 2025-09-07T07:34:42.3876217Z 2025-09-07T07:34:42.3876288Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3876472Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.3876474Z 2025-09-07T07:34:42.3876604Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3876693Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.3876757Z ================== 1 failed, 46 deselected, 2 rerun in 1.44s =================== 2025-09-07T07:34:42.3876791Z Got exit code 1 2025-09-07T07:34:42.3876829Z Retrying single test... 2025-09-07T07:34:42.3877255Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.3878244Z import pkg_resources 2025-09-07T07:34:42.3878414Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-07ed59d50b0fe7d1.xml 2025-09-07T07:34:42.3878470Z ============================= test session starts ============================== 2025-09-07T07:34:42.3878586Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.3878624Z cachedir: .pytest_cache 2025-09-07T07:34:42.3878783Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.3878834Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.3878871Z configfile: pytest.ini 2025-09-07T07:34:42.3879037Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.3879112Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.3879334Z stepcurrent: skipping 46 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.3879375Z Running 1 items in this shard 2025-09-07T07:34:42.3879378Z 2025-09-07T07:34:42.3879633Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True [W907 07:22:36.021948440 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3879764Z [W907 07:22:36.115520457 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3879867Z [W907 07:22:36.234696586 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3879913Z ('RERUN', {'yellow': True}) [0.7999s] [100%] 2025-09-07T07:34:42.3880250Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True [W907 07:22:37.671157806 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3880350Z [W907 07:22:37.671448012 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3880396Z ('RERUN', {'yellow': True}) [0.2183s] [100%] 2025-09-07T07:34:42.3880644Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True [W907 07:22:37.896668573 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3880746Z [W907 07:22:37.897030459 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3880782Z FAILED [0.2225s] [100%] 2025-09-07T07:34:42.3880784Z 2025-09-07T07:34:42.3881781Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.3881883Z _ WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.3881927Z Traceback (most recent call last): 2025-09-07T07:34:42.3882066Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3882101Z self._run_test( 2025-09-07T07:34:42.3882213Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3882288Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3882328Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3882462Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3882506Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3882543Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3882697Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3882743Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3882781Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3882916Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3882960Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3882996Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3883143Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3883223Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3883262Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3883414Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3884393Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3884546Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3884599Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3884638Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3884780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3884851Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3884889Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3885005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3885070Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3885113Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3885274Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3885336Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3885377Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3885516Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3885560Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3885598Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3885736Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3885774Z return aot_autograd( 2025-09-07T07:34:42.3885808Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3885943Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3886012Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3886059Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3887221Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3887306Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3887351Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3887569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3887614Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3887800Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3887839Z fx_g = _create_graph( 2025-09-07T07:34:42.3887874Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3888040Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3888074Z fx_g = make_fx( 2025-09-07T07:34:42.3888106Z ^^^^^^^^ 2025-09-07T07:34:42.3888257Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3888302Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3888342Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3888488Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3888531Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3888566Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3888724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3888761Z t = dispatch_trace( 2025-09-07T07:34:42.3888796Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3888908Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3889891Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3889926Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3890051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3890093Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3890155Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3890319Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3890397Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3890437Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3890562Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3890634Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3890668Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3890794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3890834Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3890869Z ^^^^^^^^^ 2025-09-07T07:34:42.3891001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3891044Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3891078Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3891228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3891277Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3891310Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3891468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3891530Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3892505Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3892682Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3892742Z outs_pair = fn(*args) 2025-09-07T07:34:42.3892781Z ^^^^^^^^^ 2025-09-07T07:34:42.3892952Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3893019Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3893062Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3893241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3893278Z outs_pair = fn(*args) 2025-09-07T07:34:42.3893311Z ^^^^^^^^^ 2025-09-07T07:34:42.3893488Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3893547Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3893589Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3893787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3893857Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3893901Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3894074Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3894113Z outs_pair = fn(*args) 2025-09-07T07:34:42.3894147Z ^^^^^^^^^ 2025-09-07T07:34:42.3894334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3894378Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3894413Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3895512Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3895578Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3895615Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3895740Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3895782Z return handle_torch_function( 2025-09-07T07:34:42.3895817Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3895985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3896060Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3896105Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3896272Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3896317Z return func(*args, **kwargs) 2025-09-07T07:34:42.3896351Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3896476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3896557Z result = _engine_run_backward( 2025-09-07T07:34:42.3896593Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3896739Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3896862Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3896910Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3897035Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3897101Z return user_fn(self, *args) 2025-09-07T07:34:42.3897139Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3898224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3898268Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3898303Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3898462Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3898507Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3898542Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3898664Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3898702Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3898737Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3898902Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3898956Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3898995Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3899133Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3899181Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3899220Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3899382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3899429Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3899467Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3899624Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3899664Z t = dispatch_trace( 2025-09-07T07:34:42.3899723Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3899836Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3900813Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3900848Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3900972Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3901009Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3901092Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3901252Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3901331Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3901370Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3901494Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3901534Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3901568Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3901694Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3901734Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3901768Z ^^^^^^^^^ 2025-09-07T07:34:42.3901916Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3901967Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3902000Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3902041Z File "", line 1, in 2025-09-07T07:34:42.3902184Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3902260Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3902325Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3903397Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3903444Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3903482Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3903672Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3903717Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3903751Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3903922Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3903964Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3904000Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3904147Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3904189Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3904223Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3904357Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3904443Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3904491Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3904616Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3904675Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3904718Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3904843Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3904908Z leaves = list(leaves) 2025-09-07T07:34:42.3904941Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3905065Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3906028Z return func(x) 2025-09-07T07:34:42.3906062Z ^^^^^^^ 2025-09-07T07:34:42.3906198Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3906295Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3906336Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3906548Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3906588Z return func(*args, **kwargs) 2025-09-07T07:34:42.3906624Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3906804Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3906891Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3906893Z 2025-09-07T07:34:42.3907098Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3907101Z 2025-09-07T07:34:42.3907102Z 2025-09-07T07:34:42.3907176Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3907362Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.3907364Z 2025-09-07T07:34:42.3907449Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3907523Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3907586Z inline_call [] 2025-09-07T07:34:42.3907639Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3907673Z inductor [] 2025-09-07T07:34:42.3907746Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3907817Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3908078Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3909142Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3909193Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3909344Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3909431Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3909563Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3909681Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3909780Z _ WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.3909822Z Traceback (most recent call last): 2025-09-07T07:34:42.3909959Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3909993Z self._run_test( 2025-09-07T07:34:42.3910104Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3910158Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3910198Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3910330Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3910401Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3910439Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3910589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3910634Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3910673Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3910843Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3910886Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3910923Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3912004Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3912087Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3912128Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3912279Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3912323Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3912474Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3912526Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3912568Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3912708Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3912759Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3912797Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3912913Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3912998Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3913042Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3913166Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3913229Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3913270Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3913410Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3913452Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3913489Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3913625Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3914602Z return aot_autograd( 2025-09-07T07:34:42.3914638Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3914775Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3914842Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3914887Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3915047Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3915132Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3915176Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3915357Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3915401Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3915606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3915644Z fx_g = _create_graph( 2025-09-07T07:34:42.3915679Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3915841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3915874Z fx_g = make_fx( 2025-09-07T07:34:42.3915906Z ^^^^^^^^ 2025-09-07T07:34:42.3916101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3916146Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3916183Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3916329Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3916373Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3916409Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3917560Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3917599Z t = dispatch_trace( 2025-09-07T07:34:42.3917632Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3917746Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3917786Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3917823Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3917947Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3917986Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3918020Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3918181Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3918290Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3918331Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3918454Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3918492Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3918525Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3918651Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3918693Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3918726Z ^^^^^^^^^ 2025-09-07T07:34:42.3918858Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3918897Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3918932Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3919082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3920064Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3920097Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3920318Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3920379Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3920424Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3920598Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3920637Z outs_pair = fn(*args) 2025-09-07T07:34:42.3920671Z ^^^^^^^^^ 2025-09-07T07:34:42.3920841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3920936Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3920981Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3921156Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3921195Z outs_pair = fn(*args) 2025-09-07T07:34:42.3921228Z ^^^^^^^^^ 2025-09-07T07:34:42.3921440Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3921499Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3921542Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3921735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3921807Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3921851Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3922023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3923003Z outs_pair = fn(*args) 2025-09-07T07:34:42.3923038Z ^^^^^^^^^ 2025-09-07T07:34:42.3923228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3923273Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3923308Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3923476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3923545Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3923583Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3923709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3923749Z return handle_torch_function( 2025-09-07T07:34:42.3923785Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3923925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3924001Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3924045Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3924213Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3924253Z return func(*args, **kwargs) 2025-09-07T07:34:42.3924289Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3924415Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3924457Z result = _engine_run_backward( 2025-09-07T07:34:42.3924491Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3924636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3924755Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3925736Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3925863Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3925904Z return user_fn(self, *args) 2025-09-07T07:34:42.3925939Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3926083Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3926148Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3926185Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3926341Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3926386Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3926420Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3926654Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3926693Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3926728Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3926893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3926943Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3926984Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3927124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3927172Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3927210Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3927371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3927416Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3928406Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3928566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3928604Z t = dispatch_trace( 2025-09-07T07:34:42.3928636Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3928749Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3928816Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3928851Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3928973Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3929011Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3929045Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3929204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3929284Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3929324Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3929446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3929483Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3929516Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3929644Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3929685Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3929719Z ^^^^^^^^^ 2025-09-07T07:34:42.3929868Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3929916Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3929949Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3930921Z File "", line 1, in 2025-09-07T07:34:42.3931070Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3931147Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3931191Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3931326Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3931400Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3931437Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3931628Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3931670Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3931706Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3931907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3931951Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3931987Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3932130Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3932171Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3932209Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3932344Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3932431Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3932476Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3932601Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3932662Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3933636Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3933762Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3933801Z leaves = list(leaves) 2025-09-07T07:34:42.3933834Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3933978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3934014Z return func(x) 2025-09-07T07:34:42.3934046Z ^^^^^^^ 2025-09-07T07:34:42.3934184Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3934247Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3934288Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3934455Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3934496Z return func(*args, **kwargs) 2025-09-07T07:34:42.3934530Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3934711Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3934794Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3934800Z 2025-09-07T07:34:42.3935005Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3935008Z 2025-09-07T07:34:42.3935009Z 2025-09-07T07:34:42.3935080Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3935268Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.3935270Z 2025-09-07T07:34:42.3935356Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3935429Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3935464Z inline_call [] 2025-09-07T07:34:42.3935517Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3936523Z inductor [] 2025-09-07T07:34:42.3936632Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3936704Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3936965Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3937080Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3937168Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3937318Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3937404Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3937534Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3937658Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3937728Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3937762Z inline_call [] 2025-09-07T07:34:42.3937814Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3937886Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3937954Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3938208Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3938320Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3938369Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3938537Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3938623Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3938751Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3939812Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3939867Z =================================== FAILURES =================================== 2025-09-07T07:34:42.3939966Z _ WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.3940008Z Traceback (most recent call last): 2025-09-07T07:34:42.3940145Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3940179Z self._run_test( 2025-09-07T07:34:42.3940294Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3940349Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3940389Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3940519Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3940565Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3940603Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3940757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3940802Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3940840Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3940975Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3941017Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3941075Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3941218Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3941298Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3941335Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3941486Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3942504Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3942656Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3942708Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3942748Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3942889Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3942945Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3942983Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3943101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3943164Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3943208Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3943334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3943397Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3943437Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3943577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3943637Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3943677Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3943814Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3943853Z return aot_autograd( 2025-09-07T07:34:42.3943887Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3944022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3944091Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3945070Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3945230Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3945312Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3945358Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3945543Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3945584Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3945770Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3945808Z fx_g = _create_graph( 2025-09-07T07:34:42.3945843Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3946008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3946041Z fx_g = make_fx( 2025-09-07T07:34:42.3946073Z ^^^^^^^^ 2025-09-07T07:34:42.3946223Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3946270Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3946325Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3946471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3946592Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3946629Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3946789Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3946871Z t = dispatch_trace( 2025-09-07T07:34:42.3946904Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3947017Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3948008Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3948044Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3948168Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3948211Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3948246Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3948407Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3948484Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3948524Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3948649Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3948686Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3948720Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3948845Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3948886Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3948919Z ^^^^^^^^^ 2025-09-07T07:34:42.3949079Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3949118Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3949154Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3949303Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3949353Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3949386Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3949544Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3949604Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3950582Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3950757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3950801Z outs_pair = fn(*args) 2025-09-07T07:34:42.3950834Z ^^^^^^^^^ 2025-09-07T07:34:42.3951005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3951069Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3951113Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3951287Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3951325Z outs_pair = fn(*args) 2025-09-07T07:34:42.3951357Z ^^^^^^^^^ 2025-09-07T07:34:42.3951534Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3951592Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3951660Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3951854Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3951923Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3951967Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3952179Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3952217Z outs_pair = fn(*args) 2025-09-07T07:34:42.3952250Z ^^^^^^^^^ 2025-09-07T07:34:42.3952438Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3952482Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3953457Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3953627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3953671Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3953708Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3953832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3953873Z return handle_torch_function( 2025-09-07T07:34:42.3953911Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3954051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3954125Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3954169Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3954356Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3954396Z return func(*args, **kwargs) 2025-09-07T07:34:42.3954431Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3954553Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3954594Z result = _engine_run_backward( 2025-09-07T07:34:42.3954629Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3954776Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3954896Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3954944Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3955070Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3955113Z return user_fn(self, *args) 2025-09-07T07:34:42.3955148Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3956228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3956271Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3956307Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3956465Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3956636Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3956672Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3956796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3956833Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3956868Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3957035Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3957113Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3957151Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3957287Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3957336Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3957412Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3957573Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3957620Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3957658Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3957817Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3957856Z t = dispatch_trace( 2025-09-07T07:34:42.3957889Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3958966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3959009Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3959044Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3959167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3959209Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3959242Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3959403Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3959479Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3959520Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3959667Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3959707Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3959740Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3959866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3959906Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3959938Z ^^^^^^^^^ 2025-09-07T07:34:42.3960089Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3960136Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3960215Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3960255Z File "", line 1, in 2025-09-07T07:34:42.3960398Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3960476Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3960522Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3961601Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3961649Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3961686Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3961880Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3961921Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3961957Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3962128Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.3962172Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.3962231Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3962375Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.3962415Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.3962450Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3962584Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.3962713Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.3962758Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3962882Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.3962941Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.3962985Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3963111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.3963149Z leaves = list(leaves) 2025-09-07T07:34:42.3963182Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.3964237Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.3964272Z return func(x) 2025-09-07T07:34:42.3964305Z ^^^^^^^ 2025-09-07T07:34:42.3964444Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.3964507Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.3964548Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3964713Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3964754Z return func(*args, **kwargs) 2025-09-07T07:34:42.3964811Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3964995Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3965080Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3965083Z 2025-09-07T07:34:42.3965289Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3965291Z 2025-09-07T07:34:42.3965294Z 2025-09-07T07:34:42.3965367Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3965551Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.3965554Z 2025-09-07T07:34:42.3965639Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3965713Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3965747Z inline_call [] 2025-09-07T07:34:42.3965801Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3965834Z inductor [] 2025-09-07T07:34:42.3965908Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3965978Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3966238Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3967471Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3967525Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3967676Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3967792Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3967922Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3968040Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3968111Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3968145Z inline_call [] 2025-09-07T07:34:42.3968236Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3968309Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3968378Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3968635Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3968748Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3968798Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3968946Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3969030Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3969160Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3969278Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3969347Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.3969381Z inline_call [] 2025-09-07T07:34:42.3969433Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.3970474Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.3970544Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.3970799Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.3970910Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.3970961Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.3971111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.3971195Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.3971323Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.3971447Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3971663Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-07ed59d50b0fe7d1.xml - 2025-09-07T07:34:42.3971721Z =========================== short test summary info ============================ 2025-09-07T07:34:42.3972078Z FAILED [0.2225s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.3972162Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.3972164Z 2025-09-07T07:34:42.3972369Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.3972394Z 2025-09-07T07:34:42.3972395Z 2025-09-07T07:34:42.3972467Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.3972651Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.3972653Z 2025-09-07T07:34:42.3972738Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.3972829Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.3972895Z ================== 1 failed, 245 deselected, 2 rerun in 1.43s ================== 2025-09-07T07:34:42.3972929Z Got exit code 1 2025-09-07T07:34:42.3972967Z Retrying single test... 2025-09-07T07:34:42.3973392Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.3974488Z import pkg_resources 2025-09-07T07:34:42.3974658Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-1415e447debb96d8.xml 2025-09-07T07:34:42.3974715Z ============================= test session starts ============================== 2025-09-07T07:34:42.3974830Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.3974869Z cachedir: .pytest_cache 2025-09-07T07:34:42.3975024Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.3975068Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.3975106Z configfile: pytest.ini 2025-09-07T07:34:42.3975267Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.3975367Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.3975590Z stepcurrent: skipping 46 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.3975631Z Running 1 items in this shard 2025-09-07T07:34:42.3975633Z 2025-09-07T07:34:42.3975890Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True [W907 07:22:46.762615332 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3975995Z [W907 07:22:46.850128109 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3976096Z [W907 07:22:46.966067485 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3976143Z ('RERUN', {'yellow': True}) [0.6623s] [100%] 2025-09-07T07:34:42.3976393Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True [W907 07:22:46.282156114 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3976558Z [W907 07:22:46.282434740 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3976604Z ('RERUN', {'yellow': True}) [0.2004s] [100%] 2025-09-07T07:34:42.3976850Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True [W907 07:22:46.484485675 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3976950Z [W907 07:22:46.484737311 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.3978017Z FAILED [0.1992s] [100%] 2025-09-07T07:34:42.3978020Z 2025-09-07T07:34:42.3978070Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.3978207Z _ WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.3978251Z Traceback (most recent call last): 2025-09-07T07:34:42.3978391Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.3978426Z self._run_test( 2025-09-07T07:34:42.3978538Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.3978594Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.3978671Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3978806Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.3978851Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.3978890Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3979040Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.3979089Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.3979126Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3979262Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.3979305Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.3979342Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3979485Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.3979564Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.3979603Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3980703Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.3980749Z raise BackendCompilerFailed( 2025-09-07T07:34:42.3980929Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.3980981Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3981021Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3981163Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.3981213Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.3981255Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3981370Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.3981436Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.3981478Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3981605Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.3981669Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.3981710Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3981849Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.3981893Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.3981930Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3982070Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.3982109Z return aot_autograd( 2025-09-07T07:34:42.3982144Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.3982279Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.3982347Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.3983348Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3983510Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.3983593Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.3983638Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3983853Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.3983897Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.3984082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.3984122Z fx_g = _create_graph( 2025-09-07T07:34:42.3984156Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3984322Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.3984357Z fx_g = make_fx( 2025-09-07T07:34:42.3984389Z ^^^^^^^^ 2025-09-07T07:34:42.3984541Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.3984585Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.3984623Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3984770Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.3984813Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.3984848Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3985005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3985041Z t = dispatch_trace( 2025-09-07T07:34:42.3985098Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3985210Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3986188Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3986224Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3986348Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3986387Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3986422Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3986646Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3986726Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3986766Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3986892Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3986933Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3986967Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3987093Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3987134Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3987168Z ^^^^^^^^^ 2025-09-07T07:34:42.3987298Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.3987340Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.3987374Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3987523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3987572Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3987605Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3987761Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.3988801Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.3988846Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3989022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3989060Z outs_pair = fn(*args) 2025-09-07T07:34:42.3989095Z ^^^^^^^^^ 2025-09-07T07:34:42.3989305Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.3989373Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.3989416Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3989589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3989631Z outs_pair = fn(*args) 2025-09-07T07:34:42.3989665Z ^^^^^^^^^ 2025-09-07T07:34:42.3989841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.3989900Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.3989941Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3990139Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.3990208Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.3990254Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3990427Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.3990499Z outs_pair = fn(*args) 2025-09-07T07:34:42.3990532Z ^^^^^^^^^ 2025-09-07T07:34:42.3990721Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.3990765Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.3991738Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3991910Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.3991956Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.3991991Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3992117Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.3992158Z return handle_torch_function( 2025-09-07T07:34:42.3992197Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3992338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.3992411Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.3992456Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3992622Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.3992662Z return func(*args, **kwargs) 2025-09-07T07:34:42.3992698Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3992823Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.3992864Z result = _engine_run_backward( 2025-09-07T07:34:42.3992899Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3993045Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.3993192Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.3993239Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3993368Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.3993409Z return user_fn(self, *args) 2025-09-07T07:34:42.3994378Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3994559Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.3994602Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.3994637Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3994794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.3994841Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.3994877Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3994998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3995037Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3995072Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3995235Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.3995287Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.3995326Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3995462Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.3995510Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.3995548Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3995725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.3995775Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.3995813Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3995975Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.3996012Z t = dispatch_trace( 2025-09-07T07:34:42.3996046Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3997166Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.3997210Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.3997245Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3997369Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3997406Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3997445Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3997603Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.3997682Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.3997721Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3997844Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.3997881Z return fn(*args, **kwargs) 2025-09-07T07:34:42.3997916Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3998042Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.3998082Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.3998115Z ^^^^^^^^^ 2025-09-07T07:34:42.3998265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.3998341Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.3998374Z ^^^^^^^^^^^ 2025-09-07T07:34:42.3998416Z File "", line 1, in 2025-09-07T07:34:42.3998558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.3998637Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.3999617Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.3999796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.3999843Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.3999881Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4000073Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4000120Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4000195Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4000366Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4000409Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4000446Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4000589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4000631Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4000665Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4000798Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4000885Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4000950Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4001077Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4001137Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4001179Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4001305Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4001343Z leaves = list(leaves) 2025-09-07T07:34:42.4001377Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4002445Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4002482Z return func(x) 2025-09-07T07:34:42.4002513Z ^^^^^^^ 2025-09-07T07:34:42.4002650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4002717Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4002758Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4002925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4002966Z return func(*args, **kwargs) 2025-09-07T07:34:42.4003000Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4003184Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4003270Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4003272Z 2025-09-07T07:34:42.4003476Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4003480Z 2025-09-07T07:34:42.4003484Z 2025-09-07T07:34:42.4003574Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4003759Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.4003762Z 2025-09-07T07:34:42.4003845Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4003919Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4003953Z inline_call [] 2025-09-07T07:34:42.4004035Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4004069Z inductor [] 2025-09-07T07:34:42.4004142Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4004213Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4005402Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4005521Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4005571Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4005722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4005807Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4005940Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4006061Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4006160Z _ WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4006202Z Traceback (most recent call last): 2025-09-07T07:34:42.4006356Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.4006391Z self._run_test( 2025-09-07T07:34:42.4006564Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4006620Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4006660Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4006790Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4006838Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4006876Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4007027Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4007072Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4007110Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4007248Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4007291Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4008273Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4008417Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4008496Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4008535Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4008687Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4008731Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4008879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4008935Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4009000Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4009144Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4009194Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4009233Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4009348Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4009451Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4009496Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4009620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4009683Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4009724Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4009866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4009909Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4009945Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4010081Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4011056Z return aot_autograd( 2025-09-07T07:34:42.4011091Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4011229Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4011297Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4011342Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4011502Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4011616Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4011660Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4011843Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4011885Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4012073Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4012111Z fx_g = _create_graph( 2025-09-07T07:34:42.4012146Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4012308Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4012341Z fx_g = make_fx( 2025-09-07T07:34:42.4012375Z ^^^^^^^^ 2025-09-07T07:34:42.4012528Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4012572Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4012609Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4012753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4012795Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4013764Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4013923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4013960Z t = dispatch_trace( 2025-09-07T07:34:42.4013994Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4014105Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4014148Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4014210Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4014333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4014373Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4014407Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4014568Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4014676Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4014718Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4014842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4014880Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4014913Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4015040Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4015084Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4015118Z ^^^^^^^^^ 2025-09-07T07:34:42.4015249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4015288Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4015322Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4016405Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4016455Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4016554Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4016710Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4016771Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4016842Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4017020Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4017059Z outs_pair = fn(*args) 2025-09-07T07:34:42.4017092Z ^^^^^^^^^ 2025-09-07T07:34:42.4017262Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4017329Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4017373Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4017546Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4017585Z outs_pair = fn(*args) 2025-09-07T07:34:42.4017617Z ^^^^^^^^^ 2025-09-07T07:34:42.4017797Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4017855Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4017898Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4018094Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4018165Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4018209Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4019325Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4019364Z outs_pair = fn(*args) 2025-09-07T07:34:42.4019397Z ^^^^^^^^^ 2025-09-07T07:34:42.4019588Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4019663Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4019698Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4019866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4019911Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4019947Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4020110Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4020154Z return handle_torch_function( 2025-09-07T07:34:42.4020189Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4020329Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4020406Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4020453Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4020618Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4020659Z return func(*args, **kwargs) 2025-09-07T07:34:42.4020693Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4020816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4020859Z result = _engine_run_backward( 2025-09-07T07:34:42.4020894Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4021042Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4022101Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4022169Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4022297Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4022337Z return user_fn(self, *args) 2025-09-07T07:34:42.4022372Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4022515Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4022558Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4022594Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4022750Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4022794Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4022828Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4022951Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4022992Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4023027Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4023191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4023241Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4023280Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4023418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4023466Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4023504Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4023663Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4024644Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4024707Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4024870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4024907Z t = dispatch_trace( 2025-09-07T07:34:42.4024940Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4025052Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4025094Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4025160Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4025284Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4025323Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4025356Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4025517Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4025598Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4025639Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4025761Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4025799Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4025832Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4025958Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4025999Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4026033Z ^^^^^^^^^ 2025-09-07T07:34:42.4026181Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4026229Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4027260Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4027340Z File "", line 1, in 2025-09-07T07:34:42.4027484Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4027562Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4027605Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4027740Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4027790Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4027828Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4028020Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4028062Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4028096Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4028270Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4028314Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4028350Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4028493Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4028537Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4028574Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4028709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4028797Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4028842Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4028966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4029051Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4030038Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4030164Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4030203Z leaves = list(leaves) 2025-09-07T07:34:42.4030236Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4030359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4030434Z return func(x) 2025-09-07T07:34:42.4030467Z ^^^^^^^ 2025-09-07T07:34:42.4030604Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4030669Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4030710Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4030882Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4030924Z return func(*args, **kwargs) 2025-09-07T07:34:42.4030960Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4031141Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4031225Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4031227Z 2025-09-07T07:34:42.4031434Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4031437Z 2025-09-07T07:34:42.4031438Z 2025-09-07T07:34:42.4031511Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4031695Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.4031723Z 2025-09-07T07:34:42.4031808Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4031880Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4031915Z inline_call [] 2025-09-07T07:34:42.4032913Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4032948Z inductor [] 2025-09-07T07:34:42.4033022Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4033095Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4033353Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4033468Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4033521Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4033671Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4033756Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4033888Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4034010Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4034081Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4034114Z inline_call [] 2025-09-07T07:34:42.4034167Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4034238Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4034308Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4034582Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4034693Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4034742Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4034919Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4035004Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4036073Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4036192Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4036242Z =================================== FAILURES =================================== 2025-09-07T07:34:42.4036343Z _ WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4036388Z Traceback (most recent call last): 2025-09-07T07:34:42.4036592Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.4036627Z self._run_test( 2025-09-07T07:34:42.4036739Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4036795Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4036835Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4036967Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4037014Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4037053Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4037233Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4037281Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4037319Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4037457Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4037500Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4037536Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4037680Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4037760Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4037798Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4038900Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4038950Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4039100Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4039152Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4039191Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4039334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4039386Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4039424Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4039539Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4039604Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4039647Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4039801Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4039864Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4039904Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4040046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4040089Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4040220Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4040358Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4040398Z return aot_autograd( 2025-09-07T07:34:42.4040432Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4040568Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4040637Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4041633Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4041793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4041876Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4041922Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4042107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4042149Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4042335Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4042374Z fx_g = _create_graph( 2025-09-07T07:34:42.4042432Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4042595Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4042629Z fx_g = make_fx( 2025-09-07T07:34:42.4042661Z ^^^^^^^^ 2025-09-07T07:34:42.4042812Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4042857Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4042896Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4043043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4043085Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4043120Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4043279Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4043317Z t = dispatch_trace( 2025-09-07T07:34:42.4043350Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4044399Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4044441Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4044476Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4044600Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4044641Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4044675Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4044839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4044916Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4044957Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4045082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4045142Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4045177Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4045302Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4045343Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4045378Z ^^^^^^^^^ 2025-09-07T07:34:42.4045540Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4045581Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4045615Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4045764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4045812Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4045847Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4046005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4047089Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4047134Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4047310Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4047348Z outs_pair = fn(*args) 2025-09-07T07:34:42.4047385Z ^^^^^^^^^ 2025-09-07T07:34:42.4047556Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4047623Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4047666Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4047866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4047907Z outs_pair = fn(*args) 2025-09-07T07:34:42.4047940Z ^^^^^^^^^ 2025-09-07T07:34:42.4048117Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4048176Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4048220Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4048414Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4048484Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4048529Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4048701Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4048740Z outs_pair = fn(*args) 2025-09-07T07:34:42.4048773Z ^^^^^^^^^ 2025-09-07T07:34:42.4048962Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4049949Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4049985Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4050159Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4050204Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4050240Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4050364Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4050409Z return handle_torch_function( 2025-09-07T07:34:42.4050468Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4050610Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4050684Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4050729Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4050946Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4050987Z return func(*args, **kwargs) 2025-09-07T07:34:42.4051023Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4051146Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4051187Z result = _engine_run_backward( 2025-09-07T07:34:42.4051222Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4051371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4051491Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4051540Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4051665Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4051707Z return user_fn(self, *args) 2025-09-07T07:34:42.4052680Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4052826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4052868Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4052905Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4053065Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4053131Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4053166Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4053292Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4053330Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4053366Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4053533Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4053585Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4053624Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4053760Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4053808Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4053850Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4054012Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4054058Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4054096Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4054255Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4054292Z t = dispatch_trace( 2025-09-07T07:34:42.4055267Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4055380Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4055422Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4055457Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4055581Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4055645Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4055679Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4055839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4055916Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4055957Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4056114Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4056153Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4056186Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4056312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4056353Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4056387Z ^^^^^^^^^ 2025-09-07T07:34:42.4056676Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4056728Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4056761Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4056803Z File "", line 1, in 2025-09-07T07:34:42.4056948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4057025Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4058038Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4058176Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4058222Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4058259Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4058450Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4058524Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4058558Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4058730Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4058773Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4058810Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4058955Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4058996Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4059032Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4059168Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4059259Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4059304Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4059430Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4059489Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4059532Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4059658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4059697Z leaves = list(leaves) 2025-09-07T07:34:42.4059731Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4060794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4060829Z return func(x) 2025-09-07T07:34:42.4060861Z ^^^^^^^ 2025-09-07T07:34:42.4060999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4061092Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4061133Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4061299Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4061339Z return func(*args, **kwargs) 2025-09-07T07:34:42.4061374Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4061591Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4061676Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4061679Z 2025-09-07T07:34:42.4061886Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4061891Z 2025-09-07T07:34:42.4061893Z 2025-09-07T07:34:42.4061966Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4062153Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.4062155Z 2025-09-07T07:34:42.4062240Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4062314Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4062349Z inline_call [] 2025-09-07T07:34:42.4062401Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4062435Z inductor [] 2025-09-07T07:34:42.4062507Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4062579Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4063810Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4063928Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4063978Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4064129Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4064216Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4064348Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4064465Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4064536Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4064572Z inline_call [] 2025-09-07T07:34:42.4064627Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4064697Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4064767Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4065021Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4065135Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4065184Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4065334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4065418Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4065549Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4065689Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4065758Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4065792Z inline_call [] 2025-09-07T07:34:42.4066861Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4066934Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4067055Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4067311Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4067422Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4067474Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4067621Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4067704Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4067831Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4067950Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4068164Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-1415e447debb96d8.xml - 2025-09-07T07:34:42.4068222Z =========================== short test summary info ============================ 2025-09-07T07:34:42.4068578Z FAILED [0.1992s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4068689Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4068691Z 2025-09-07T07:34:42.4068898Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4068900Z 2025-09-07T07:34:42.4068902Z 2025-09-07T07:34:42.4068974Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4069161Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.4069163Z 2025-09-07T07:34:42.4069246Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4069306Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.4069371Z ================== 1 failed, 245 deselected, 2 rerun in 1.23s ================== 2025-09-07T07:34:42.4069406Z Got exit code 1 2025-09-07T07:34:42.4069529Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-09-07T07:34:42.4070908Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.4070949Z import pkg_resources 2025-09-07T07:34:42.4071117Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-8a78b8e7c77a756d.xml 2025-09-07T07:34:42.4071172Z ============================= test session starts ============================== 2025-09-07T07:34:42.4071312Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.4071350Z cachedir: .pytest_cache 2025-09-07T07:34:42.4071506Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.4071551Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.4071589Z configfile: pytest.ini 2025-09-07T07:34:42.4071786Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.4071863Z collecting ... collected 467 items / 47 deselected / 420 selected 2025-09-07T07:34:42.4071913Z stepcurrent: skipping 47 already run items. 2025-09-07T07:34:42.4071955Z Running 199 items in this shard 2025-09-07T07:34:42.4071957Z 2025-09-07T07:34:42.4072214Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_False [W907 07:22:55.064994541 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4072323Z [W907 07:22:56.641964435 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4072424Z [W907 07:22:56.786511719 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4072525Z [W907 07:22:58.453414497 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4072624Z [W907 07:22:58.462495193 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4072722Z [W907 07:22:58.509791864 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4072819Z [W907 07:22:58.510315486 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4072916Z [W907 07:22:58.512450165 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4073977Z [W907 07:22:58.512643422 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4074079Z [W907 07:22:58.512803659 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4074177Z [W907 07:22:58.512993596 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4074274Z [W907 07:22:58.513179763 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4074373Z [W907 07:22:58.513335961 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4074471Z [W907 07:22:58.513500459 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4074568Z [W907 07:22:58.513669937 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4074666Z [W907 07:22:58.513820494 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4074765Z [W907 07:22:58.514015271 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4074862Z [W907 07:22:58.514292648 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4074959Z [W907 07:22:58.514464625 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4075056Z [W907 07:22:58.514654782 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4075152Z [W907 07:22:58.514834099 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4075249Z [W907 07:22:58.515035776 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4075345Z [W907 07:22:58.515213574 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4075442Z [W907 07:22:58.515405561 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4075541Z [W907 07:22:58.515582898 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4075664Z [W907 07:22:58.515755145 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4075761Z [W907 07:22:58.515922824 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4075858Z [W907 07:22:58.516979068 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4075984Z [W907 07:22:58.517184884 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4077091Z [W907 07:22:58.517380192 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4077191Z [W907 07:22:58.517539320 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4077288Z [W907 07:22:58.517690697 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4077386Z [W907 07:22:58.517861385 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4077489Z [W907 07:22:58.518036222 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4077586Z [W907 07:22:58.518192829 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4077683Z [W907 07:22:58.518369897 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4077782Z [W907 07:22:58.518635083 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4077879Z [W907 07:22:58.518800320 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4077975Z [W907 07:22:58.518970558 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4078072Z [W907 07:22:58.519131426 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4078194Z [W907 07:22:58.519279604 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4078292Z [W907 07:22:58.519442881 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4078388Z [W907 07:22:58.519607099 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4078484Z [W907 07:22:58.519767117 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4078583Z [W907 07:22:58.519932723 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4078679Z [W907 07:22:58.520950009 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4078778Z [W907 07:22:58.521151086 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4078874Z [W907 07:22:58.521338633 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4078972Z [W907 07:22:58.521511731 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4079068Z [W907 07:22:58.521675709 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4079165Z [W907 07:22:58.521943504 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4080254Z [W907 07:22:58.522128481 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4080355Z [W907 07:22:58.522309719 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4080453Z [W907 07:22:58.522463106 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4080549Z [W907 07:22:58.522630274 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4080586Z PASSED [3.7405s] [ 0%] 2025-09-07T07:34:42.4080833Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True [W907 07:22:59.026918582 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4080965Z [W907 07:22:59.027221858 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4081012Z ('RERUN', {'yellow': True}) [0.7220s] [ 1%] 2025-09-07T07:34:42.4081291Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True [W907 07:23:00.755924578 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4081389Z [W907 07:23:00.756422561 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4081434Z ('RERUN', {'yellow': True}) [0.6818s] [ 1%] 2025-09-07T07:34:42.4081676Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True [W907 07:23:00.439917401 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4081776Z [W907 07:23:00.440260375 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4081812Z FAILED [0.6930s] [ 1%] 2025-09-07T07:34:42.4081814Z 2025-09-07T07:34:42.4081864Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.4081962Z _ WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.4082005Z Traceback (most recent call last): 2025-09-07T07:34:42.4082147Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.4082181Z self._run_test( 2025-09-07T07:34:42.4082294Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4082349Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4083334Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4083490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4083539Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4083576Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4083729Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4083774Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4083812Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4083949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4083992Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4084028Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4084172Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4084255Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4084295Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4084446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4084492Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4084641Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4084695Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4084736Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4084879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4084929Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4084967Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4085082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4086109Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4086153Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4086280Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4086342Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4086384Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4086617Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4086662Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4086700Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4086837Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4086878Z return aot_autograd( 2025-09-07T07:34:42.4086915Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4087051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4087120Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4087165Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4087325Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4087409Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4087453Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4087636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4087700Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4087887Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4087925Z fx_g = _create_graph( 2025-09-07T07:34:42.4088907Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4089072Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4089106Z fx_g = make_fx( 2025-09-07T07:34:42.4089137Z ^^^^^^^^ 2025-09-07T07:34:42.4089294Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4089339Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4089377Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4089523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4089571Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4089606Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4089765Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4089802Z t = dispatch_trace( 2025-09-07T07:34:42.4089836Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4089947Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4089988Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4090025Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4090150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4090190Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4090224Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4090386Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4090494Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4090535Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4091601Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4091641Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4091674Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4091832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4091874Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4091908Z ^^^^^^^^^ 2025-09-07T07:34:42.4092040Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4092080Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4092114Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4092269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4092318Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4092352Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4092507Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4092569Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4092614Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4092789Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4092828Z outs_pair = fn(*args) 2025-09-07T07:34:42.4092861Z ^^^^^^^^^ 2025-09-07T07:34:42.4093033Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4093115Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4093162Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4093334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4094304Z outs_pair = fn(*args) 2025-09-07T07:34:42.4094337Z ^^^^^^^^^ 2025-09-07T07:34:42.4094516Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4094575Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4094617Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4094811Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4094884Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4094928Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4095102Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4095139Z outs_pair = fn(*args) 2025-09-07T07:34:42.4095173Z ^^^^^^^^^ 2025-09-07T07:34:42.4095369Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4095413Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4095449Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4095618Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4095665Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4095721Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4095848Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4095890Z return handle_torch_function( 2025-09-07T07:34:42.4095925Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4096065Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4097192Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4097240Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4097407Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4097447Z return func(*args, **kwargs) 2025-09-07T07:34:42.4097482Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4097606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4097649Z result = _engine_run_backward( 2025-09-07T07:34:42.4097685Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4097831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4097951Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4098002Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4098128Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4098170Z return user_fn(self, *args) 2025-09-07T07:34:42.4098205Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4098351Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4098415Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4098451Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4098608Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4098653Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4098689Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4098816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4098855Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4099830Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4099996Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4100047Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4100086Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4100227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4100275Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4100313Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4100473Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4100520Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4100560Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4100718Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4100755Z t = dispatch_trace( 2025-09-07T07:34:42.4100789Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4100901Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4100969Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4101005Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4101130Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4101169Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4101202Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4101365Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4101471Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4102448Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4102572Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4102610Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4102643Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4102770Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4102812Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4102846Z ^^^^^^^^^ 2025-09-07T07:34:42.4102994Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4103042Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4103075Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4103117Z File "", line 1, in 2025-09-07T07:34:42.4103261Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4103338Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4103383Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4103519Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4103588Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4103626Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4103818Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4103861Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4103896Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4104072Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4104116Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4105084Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4105228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4105269Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4105308Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4105440Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4105528Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4105573Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4105698Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4105760Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4105803Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4105928Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4105966Z leaves = list(leaves) 2025-09-07T07:34:42.4105999Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4106124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4106177Z return func(x) 2025-09-07T07:34:42.4106209Z ^^^^^^^ 2025-09-07T07:34:42.4106347Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4106411Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4106452Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4106729Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4106770Z return func(*args, **kwargs) 2025-09-07T07:34:42.4107747Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4107930Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4108015Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4108021Z 2025-09-07T07:34:42.4108229Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4108232Z 2025-09-07T07:34:42.4108234Z 2025-09-07T07:34:42.4108305Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4108492Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.4108494Z 2025-09-07T07:34:42.4108580Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4108653Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4108688Z inline_call [] 2025-09-07T07:34:42.4108741Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4108847Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4108918Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4109177Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4109291Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4109342Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4109493Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4109580Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4109711Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4109832Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4109932Z _ WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.4109975Z Traceback (most recent call last): 2025-09-07T07:34:42.4110111Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.4111083Z self._run_test( 2025-09-07T07:34:42.4111199Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4111256Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4111295Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4111428Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4111473Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4111512Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4111688Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4111733Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4111772Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4111907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4111950Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4111986Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4112160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4112239Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4112278Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4112429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4112478Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4112626Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4112679Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4112717Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4113791Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4113845Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4113883Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4113998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4114063Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4114106Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4114263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4114325Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4114367Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4114505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4114549Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4114587Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4114724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4114762Z return aot_autograd( 2025-09-07T07:34:42.4114796Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4114932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4115003Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4115049Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4115207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4115290Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4115334Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4115518Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4116547Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4116734Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4116775Z fx_g = _create_graph( 2025-09-07T07:34:42.4116839Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4117003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4117037Z fx_g = make_fx( 2025-09-07T07:34:42.4117069Z ^^^^^^^^ 2025-09-07T07:34:42.4117222Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4117267Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4117342Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4117490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4117532Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4117567Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4117726Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4117765Z t = dispatch_trace( 2025-09-07T07:34:42.4117800Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4117912Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4117953Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4117987Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4118113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4118153Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4119136Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4119299Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4119377Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4119418Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4119567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4119606Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4119640Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4119766Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4119806Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4119842Z ^^^^^^^^^ 2025-09-07T07:34:42.4119976Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4120017Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4120051Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4120242Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4120290Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4120326Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4120485Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4120547Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4120590Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4120765Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4120805Z outs_pair = fn(*args) 2025-09-07T07:34:42.4120839Z ^^^^^^^^^ 2025-09-07T07:34:42.4121950Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4122017Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4122061Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4122236Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4122297Z outs_pair = fn(*args) 2025-09-07T07:34:42.4122331Z ^^^^^^^^^ 2025-09-07T07:34:42.4122507Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4122566Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4122636Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4122832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4122901Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4122946Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4123119Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4123157Z outs_pair = fn(*args) 2025-09-07T07:34:42.4123191Z ^^^^^^^^^ 2025-09-07T07:34:42.4123382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4123427Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4123462Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4123632Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4123678Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4123715Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4123838Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4124836Z return handle_torch_function( 2025-09-07T07:34:42.4124871Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4125013Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4125086Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4125131Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4125298Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4125338Z return func(*args, **kwargs) 2025-09-07T07:34:42.4125373Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4125496Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4125537Z result = _engine_run_backward( 2025-09-07T07:34:42.4125572Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4125721Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4125842Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4125891Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4126017Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4126059Z return user_fn(self, *args) 2025-09-07T07:34:42.4126095Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4126241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4126284Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4126319Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4126476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4127548Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4127584Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4127707Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4127746Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4127781Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4127998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4128051Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4128089Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4128225Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4128273Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4128317Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4128475Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4128523Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4128560Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4128722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4128761Z t = dispatch_trace( 2025-09-07T07:34:42.4128796Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4128908Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4128950Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4128984Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4129110Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4129170Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4130143Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4130307Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4130386Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4130426Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4130553Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4130591Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4130626Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4130754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4130794Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4130828Z ^^^^^^^^^ 2025-09-07T07:34:42.4130980Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4131030Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4131063Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4131105Z File "", line 1, in 2025-09-07T07:34:42.4131246Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4131323Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4131369Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4131505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4131551Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4131588Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4131778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4132772Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4132807Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4132979Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4133022Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4133058Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4133234Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4133277Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4133311Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4133446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4133534Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4133582Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4133705Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4133765Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4133807Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4133933Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4133971Z leaves = list(leaves) 2025-09-07T07:34:42.4134004Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4134127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4134161Z return func(x) 2025-09-07T07:34:42.4134193Z ^^^^^^^ 2025-09-07T07:34:42.4134347Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4134412Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4135384Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4135553Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4135592Z return func(*args, **kwargs) 2025-09-07T07:34:42.4135628Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4135814Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4135899Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4135901Z 2025-09-07T07:34:42.4136110Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4136115Z 2025-09-07T07:34:42.4136117Z 2025-09-07T07:34:42.4136190Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4136374Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.4136376Z 2025-09-07T07:34:42.4136462Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4136628Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4136664Z inline_call [] 2025-09-07T07:34:42.4136717Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4136791Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4136862Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4137120Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4137263Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4137314Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4137463Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4137585Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4137716Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4138798Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4138869Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4138906Z inline_call [] 2025-09-07T07:34:42.4138961Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4139033Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4139101Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4139360Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4139474Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4139523Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4139673Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4139757Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4139910Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4140031Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4140081Z =================================== FAILURES =================================== 2025-09-07T07:34:42.4140180Z _ WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.4140222Z Traceback (most recent call last): 2025-09-07T07:34:42.4140362Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.4140396Z self._run_test( 2025-09-07T07:34:42.4140510Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4140563Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4140602Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4140734Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4141728Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4141767Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4141917Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4141962Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4142000Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4142138Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4142181Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4142218Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4142361Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4142444Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4142507Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4142659Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4142704Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4142856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4142908Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4142979Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4143122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4143173Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4143210Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4143327Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4143393Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4144377Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4144504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4144567Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4144608Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4144752Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4144795Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4144833Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4144970Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4145029Z return aot_autograd( 2025-09-07T07:34:42.4145064Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4145203Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4145271Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4145317Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4145479Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4145562Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4145605Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4145787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4145831Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4146017Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4146057Z fx_g = _create_graph( 2025-09-07T07:34:42.4146091Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4146254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4147274Z fx_g = make_fx( 2025-09-07T07:34:42.4147309Z ^^^^^^^^ 2025-09-07T07:34:42.4147464Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4147509Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4147546Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4147693Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4147737Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4147800Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4147958Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4147996Z t = dispatch_trace( 2025-09-07T07:34:42.4148029Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4148142Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4148182Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4148263Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4148389Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4148429Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4148464Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4148627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4148707Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4148747Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4148870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4149849Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4149883Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4150013Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4150053Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4150087Z ^^^^^^^^^ 2025-09-07T07:34:42.4150220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4150259Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4150294Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4150469Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4150521Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4150554Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4150711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4150771Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4150817Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4150991Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4151029Z outs_pair = fn(*args) 2025-09-07T07:34:42.4151063Z ^^^^^^^^^ 2025-09-07T07:34:42.4151235Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4151304Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4151348Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4151520Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4151558Z outs_pair = fn(*args) 2025-09-07T07:34:42.4152598Z ^^^^^^^^^ 2025-09-07T07:34:42.4152778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4152837Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4152879Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4153073Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4153167Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4153212Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4153385Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4153422Z outs_pair = fn(*args) 2025-09-07T07:34:42.4153455Z ^^^^^^^^^ 2025-09-07T07:34:42.4153679Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4153725Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4153760Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4153929Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4153976Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4154013Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4154139Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4154181Z return handle_torch_function( 2025-09-07T07:34:42.4154216Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4154356Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4154432Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4154476Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4155583Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4155624Z return func(*args, **kwargs) 2025-09-07T07:34:42.4155660Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4155803Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4155848Z result = _engine_run_backward( 2025-09-07T07:34:42.4155882Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4156028Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4156148Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4156199Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4156324Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4156366Z return user_fn(self, *args) 2025-09-07T07:34:42.4156402Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4156607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4156654Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4156690Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4156848Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4156892Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4156926Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4157051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4157090Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4157124Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4158233Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4158286Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4158324Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4158494Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4158544Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4158581Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4158743Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4158789Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4158864Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4159024Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4159062Z t = dispatch_trace( 2025-09-07T07:34:42.4159095Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4159208Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4159252Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4159288Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4159411Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4159450Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4159483Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4159644Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4159723Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4159763Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4159886Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4160926Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4160960Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4161112Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4161155Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4161189Z ^^^^^^^^^ 2025-09-07T07:34:42.4161341Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4161390Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4161424Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4161464Z File "", line 1, in 2025-09-07T07:34:42.4161612Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4161689Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4161734Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4161870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4161920Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4161957Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4162149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4162191Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4162226Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4162401Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4162445Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4162480Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4163560Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4163602Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4163660Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4163794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4163881Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4163926Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4164051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4164138Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4164181Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4164307Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4164346Z leaves = list(leaves) 2025-09-07T07:34:42.4164379Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4164504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4164542Z return func(x) 2025-09-07T07:34:42.4164573Z ^^^^^^^ 2025-09-07T07:34:42.4164711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4164774Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4164816Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4164984Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4165025Z return func(*args, **kwargs) 2025-09-07T07:34:42.4165059Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4165242Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4166259Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4166283Z 2025-09-07T07:34:42.4166550Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4166552Z 2025-09-07T07:34:42.4166554Z 2025-09-07T07:34:42.4166627Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4166813Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.4166815Z 2025-09-07T07:34:42.4166901Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4166974Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4167008Z inline_call [] 2025-09-07T07:34:42.4167062Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4167137Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4167210Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4167470Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4167585Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4167634Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4167788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4167873Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4168003Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4168124Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4168224Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4168258Z inline_call [] 2025-09-07T07:34:42.4168312Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4168382Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4169402Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4169693Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4169806Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4169855Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4170005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4170093Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4170222Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4170338Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4170408Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4170443Z inline_call [] 2025-09-07T07:34:42.4170496Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4170568Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4170638Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4170889Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4171024Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4171072Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4171222Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4171306Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4171436Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4171552Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4171768Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-8a78b8e7c77a756d.xml - 2025-09-07T07:34:42.4171828Z =========================== short test summary info ============================ 2025-09-07T07:34:42.4173128Z FAILED [0.6930s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4173212Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4173215Z 2025-09-07T07:34:42.4173425Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4173428Z 2025-09-07T07:34:42.4173429Z 2025-09-07T07:34:42.4173501Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4173686Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.4173709Z 2025-09-07T07:34:42.4173794Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4173853Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.4173922Z ============= 1 failed, 1 passed, 47 deselected, 2 rerun in 6.12s ============== 2025-09-07T07:34:42.4173957Z Got exit code 1 2025-09-07T07:34:42.4173994Z Retrying single test... 2025-09-07T07:34:42.4174450Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.4174488Z import pkg_resources 2025-09-07T07:34:42.4174657Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-17114a6d624aa143.xml 2025-09-07T07:34:42.4174713Z ============================= test session starts ============================== 2025-09-07T07:34:42.4174827Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.4174865Z cachedir: .pytest_cache 2025-09-07T07:34:42.4175023Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.4175067Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.4175107Z configfile: pytest.ini 2025-09-07T07:34:42.4175267Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.4175344Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.4176574Z stepcurrent: skipping 48 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.4176652Z Running 1 items in this shard 2025-09-07T07:34:42.4176654Z 2025-09-07T07:34:42.4176908Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True [W907 07:23:08.321815882 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4177016Z [W907 07:23:09.848357810 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4177121Z [W907 07:23:09.963147134 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4177169Z ('RERUN', {'yellow': True}) [1.2864s] [100%] 2025-09-07T07:34:42.4177412Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True [W907 07:23:10.871059266 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4177513Z [W907 07:23:10.871459810 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4177561Z ('RERUN', {'yellow': True}) [0.7162s] [100%] 2025-09-07T07:34:42.4177803Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True [W907 07:23:11.663867780 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4177903Z [W907 07:23:11.664284434 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4177940Z FAILED [0.7905s] [100%] 2025-09-07T07:34:42.4177942Z 2025-09-07T07:34:42.4177993Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.4178092Z _ WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.4178134Z Traceback (most recent call last): 2025-09-07T07:34:42.4178276Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.4178311Z self._run_test( 2025-09-07T07:34:42.4178447Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4178502Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4178541Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4178676Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4179670Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4179748Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4179900Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4179946Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4179984Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4180119Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4180168Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4180204Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4180349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4180428Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4180466Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4180620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4180666Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4180816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4180869Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4180908Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4181071Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4181122Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4181160Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4181276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4181341Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4181386Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4182462Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4182528Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4182569Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4182709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4182756Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4182794Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4182933Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4182972Z return aot_autograd( 2025-09-07T07:34:42.4183006Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4183143Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4183213Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4183258Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4183419Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4183502Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4183566Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4183749Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4183791Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4183977Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4184047Z fx_g = _create_graph( 2025-09-07T07:34:42.4184084Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4184246Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4185212Z fx_g = make_fx( 2025-09-07T07:34:42.4185244Z ^^^^^^^^ 2025-09-07T07:34:42.4185397Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4185446Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4185484Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4185630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4185673Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4185707Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4185870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4185906Z t = dispatch_trace( 2025-09-07T07:34:42.4185939Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4186052Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4186092Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4186128Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4186280Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4186321Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4186356Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4186589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4186668Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4186710Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4186836Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4186875Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4187848Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4187975Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4188016Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4188055Z ^^^^^^^^^ 2025-09-07T07:34:42.4188187Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4188226Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4188260Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4188408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4188456Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4188491Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4188648Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4188709Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4188753Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4188931Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4189000Z outs_pair = fn(*args) 2025-09-07T07:34:42.4189034Z ^^^^^^^^^ 2025-09-07T07:34:42.4189208Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4189274Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4189318Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4189528Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4189568Z outs_pair = fn(*args) 2025-09-07T07:34:42.4189601Z ^^^^^^^^^ 2025-09-07T07:34:42.4190715Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4190777Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4190820Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4191014Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4191084Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4191128Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4191303Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4191341Z outs_pair = fn(*args) 2025-09-07T07:34:42.4191375Z ^^^^^^^^^ 2025-09-07T07:34:42.4191564Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4191634Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4191670Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4191839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4191884Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4191920Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4192046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4192090Z return handle_torch_function( 2025-09-07T07:34:42.4192125Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4192266Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4192340Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4192384Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4193484Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4193526Z return func(*args, **kwargs) 2025-09-07T07:34:42.4193561Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4193685Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4193725Z result = _engine_run_backward( 2025-09-07T07:34:42.4193759Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4193907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4194027Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4194076Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4194204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4194262Z return user_fn(self, *args) 2025-09-07T07:34:42.4194298Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4194442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4194484Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4194521Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4194706Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4194751Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4194786Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4194911Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4194949Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4194986Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4195152Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4196135Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4196174Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4196312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4196362Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4196401Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4196626Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4196674Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4196712Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4196896Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4196936Z t = dispatch_trace( 2025-09-07T07:34:42.4196970Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4197082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4197123Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4197159Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4197284Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4197322Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4197356Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4197517Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4197595Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4197636Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4197760Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4198745Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4198779Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4198907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4198947Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4198981Z ^^^^^^^^^ 2025-09-07T07:34:42.4199131Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4199179Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4199212Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4199254Z File "", line 1, in 2025-09-07T07:34:42.4199397Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4199504Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4199549Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4199686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4199732Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4199769Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4199996Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4200039Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4200075Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4200294Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4200343Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4200379Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4200523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4201506Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4201542Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4201677Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4201767Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4201812Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4201938Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4201997Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4202058Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4202185Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4202223Z leaves = list(leaves) 2025-09-07T07:34:42.4202256Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4202381Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4202415Z return func(x) 2025-09-07T07:34:42.4202447Z ^^^^^^^ 2025-09-07T07:34:42.4202586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4202651Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4202692Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4202860Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4202904Z return func(*args, **kwargs) 2025-09-07T07:34:42.4202939Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4203120Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4204132Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4204135Z 2025-09-07T07:34:42.4204343Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4204346Z 2025-09-07T07:34:42.4204347Z 2025-09-07T07:34:42.4204419Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4204602Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.4204605Z 2025-09-07T07:34:42.4204692Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4204787Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4204822Z inline_call [] 2025-09-07T07:34:42.4204875Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4204909Z inductor [] 2025-09-07T07:34:42.4204983Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4205056Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4205353Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4205470Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4205519Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4205671Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4205758Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4205890Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4206008Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4206108Z _ WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.4206150Z Traceback (most recent call last): 2025-09-07T07:34:42.4206285Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.4206321Z self._run_test( 2025-09-07T07:34:42.4207450Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4207505Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4207574Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4207706Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4207751Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4207791Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4207940Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4207989Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4208027Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4208162Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4208205Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4208242Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4208387Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4208467Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4208505Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4208656Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4208701Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4208853Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4208905Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4208945Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4209086Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4210071Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4210137Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4210254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4210318Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4210361Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4210486Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4210591Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4210632Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4210772Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4210814Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4210850Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4210988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4211029Z return aot_autograd( 2025-09-07T07:34:42.4211064Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4211199Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4211268Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4211312Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4211474Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4211556Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4211601Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4211782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4211842Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4212960Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4213001Z fx_g = _create_graph( 2025-09-07T07:34:42.4213034Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4213202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4213235Z fx_g = make_fx( 2025-09-07T07:34:42.4213267Z ^^^^^^^^ 2025-09-07T07:34:42.4213419Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4213464Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4213501Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4213650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4213692Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4213728Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4213885Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4213922Z t = dispatch_trace( 2025-09-07T07:34:42.4213955Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4214069Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4214110Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4214145Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4214269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4214307Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4214343Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4214525Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4215535Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4215576Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4215699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4215736Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4215801Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4215927Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4215968Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4216001Z ^^^^^^^^^ 2025-09-07T07:34:42.4216132Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4216175Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4216210Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4216356Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4216406Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4216439Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4216669Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4216733Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4216777Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4216952Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4216990Z outs_pair = fn(*args) 2025-09-07T07:34:42.4217050Z ^^^^^^^^^ 2025-09-07T07:34:42.4217224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4218239Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4218285Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4218458Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4218500Z outs_pair = fn(*args) 2025-09-07T07:34:42.4218533Z ^^^^^^^^^ 2025-09-07T07:34:42.4218709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4218768Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4218809Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4219006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4219075Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4219120Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4219290Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4219328Z outs_pair = fn(*args) 2025-09-07T07:34:42.4219362Z ^^^^^^^^^ 2025-09-07T07:34:42.4219552Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4219595Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4219631Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4219800Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4219871Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4219907Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4220031Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4220073Z return handle_torch_function( 2025-09-07T07:34:42.4221040Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4221221Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4221297Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4221340Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4221507Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4221551Z return func(*args, **kwargs) 2025-09-07T07:34:42.4221586Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4221710Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4221752Z result = _engine_run_backward( 2025-09-07T07:34:42.4221786Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4221932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4222054Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4222103Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4222229Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4222270Z return user_fn(self, *args) 2025-09-07T07:34:42.4222324Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4222468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4222511Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4222547Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4222703Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4222746Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4223716Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4223840Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4223879Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4223913Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4224081Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4224135Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4224175Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4224309Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4224358Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4224395Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4224557Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4224603Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4224642Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4224800Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4224838Z t = dispatch_trace( 2025-09-07T07:34:42.4224871Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4225008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4225050Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4225085Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4225208Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4225246Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4225280Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4226401Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4226550Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4226591Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4226715Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4226753Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4226790Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4226914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4226955Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4226988Z ^^^^^^^^^ 2025-09-07T07:34:42.4227139Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4227189Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4227222Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4227263Z File "", line 1, in 2025-09-07T07:34:42.4227405Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4227481Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4227526Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4227687Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4227733Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4227770Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4227961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4228004Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4228981Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4229152Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4229195Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4229231Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4229373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4229417Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4229451Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4229587Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4229673Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4229718Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4229844Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4229905Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4229947Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4230073Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4230147Z leaves = list(leaves) 2025-09-07T07:34:42.4230181Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4230303Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4230338Z return func(x) 2025-09-07T07:34:42.4230370Z ^^^^^^^ 2025-09-07T07:34:42.4230507Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4230571Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4230651Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4231750Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4231791Z return func(*args, **kwargs) 2025-09-07T07:34:42.4231826Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4232005Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4232092Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4232094Z 2025-09-07T07:34:42.4232300Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4232302Z 2025-09-07T07:34:42.4232304Z 2025-09-07T07:34:42.4232375Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4232561Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.4232563Z 2025-09-07T07:34:42.4232647Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4232721Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4232774Z inline_call [] 2025-09-07T07:34:42.4232829Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4232862Z inductor [] 2025-09-07T07:34:42.4232936Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4233008Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4233266Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4233381Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4233431Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4233580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4233666Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4233797Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4234847Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4234919Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4234953Z inline_call [] 2025-09-07T07:34:42.4235005Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4235080Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4235148Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4235403Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4235515Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4235590Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4235738Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4235822Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4235950Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4236099Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4236149Z =================================== FAILURES =================================== 2025-09-07T07:34:42.4236248Z _ WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.4236290Z Traceback (most recent call last): 2025-09-07T07:34:42.4236425Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.4236462Z self._run_test( 2025-09-07T07:34:42.4236628Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4236683Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4236723Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4236855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4237843Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4237883Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4238034Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4238079Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4238116Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4238252Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4238323Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4238362Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4238505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4238586Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4238623Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4238775Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4238818Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4238968Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4239020Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4239062Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4239205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4239255Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4239293Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4239408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4239474Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4240491Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4240618Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4240681Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4240722Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4240865Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4240933Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4240970Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4241107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4241145Z return aot_autograd( 2025-09-07T07:34:42.4241179Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4241351Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4241421Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4241466Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4241627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4241712Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4241757Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4241938Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4241982Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4242168Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4242208Z fx_g = _create_graph( 2025-09-07T07:34:42.4242242Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4242405Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4243376Z fx_g = make_fx( 2025-09-07T07:34:42.4243410Z ^^^^^^^^ 2025-09-07T07:34:42.4243581Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4243629Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4243665Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4243811Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4243853Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4243889Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4244050Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4244088Z t = dispatch_trace( 2025-09-07T07:34:42.4244121Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4244234Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4244274Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4244312Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4244438Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4244477Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4244513Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4244673Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4244752Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4244792Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4244916Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4244953Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4245922Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4246048Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4246120Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4246153Z ^^^^^^^^^ 2025-09-07T07:34:42.4246285Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4246324Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4246359Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4246558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4246650Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4246684Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4246840Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4246902Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4246946Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4247124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4247163Z outs_pair = fn(*args) 2025-09-07T07:34:42.4247196Z ^^^^^^^^^ 2025-09-07T07:34:42.4247369Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4247435Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4247480Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4247653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4247690Z outs_pair = fn(*args) 2025-09-07T07:34:42.4248670Z ^^^^^^^^^ 2025-09-07T07:34:42.4248847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4248931Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4248972Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4249165Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4249234Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4249282Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4249455Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4249494Z outs_pair = fn(*args) 2025-09-07T07:34:42.4249527Z ^^^^^^^^^ 2025-09-07T07:34:42.4249715Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4249763Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4249800Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4249967Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4250012Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4250047Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4250174Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4250216Z return handle_torch_function( 2025-09-07T07:34:42.4250252Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4250393Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4250467Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4250538Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4251642Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4251683Z return func(*args, **kwargs) 2025-09-07T07:34:42.4251718Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4251841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4251915Z result = _engine_run_backward( 2025-09-07T07:34:42.4251951Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4252097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4252217Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4252267Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4252397Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4252438Z return user_fn(self, *args) 2025-09-07T07:34:42.4252473Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4252617Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4252660Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4252697Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4252854Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4252897Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4252932Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4253055Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4253111Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4253145Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4254242Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4254294Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4254334Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4254473Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4254523Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4254560Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4254721Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4254768Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4254809Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4254967Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4255005Z t = dispatch_trace( 2025-09-07T07:34:42.4255038Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4255151Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4255192Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4255227Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4255352Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4255390Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4255425Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4255584Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4255664Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4255724Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4255847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4256869Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4256905Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4257031Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4257117Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4257151Z ^^^^^^^^^ 2025-09-07T07:34:42.4257301Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4257350Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4257383Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4257424Z File "", line 1, in 2025-09-07T07:34:42.4257568Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4257646Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4257691Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4257826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4257872Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4257910Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4258102Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4258144Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4258180Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4258349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4258416Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4258451Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4259539Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4259582Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4259617Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4259754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4259842Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4259886Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4260010Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4260073Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4260117Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4260242Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4260280Z leaves = list(leaves) 2025-09-07T07:34:42.4260314Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4260435Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4260470Z return func(x) 2025-09-07T07:34:42.4260503Z ^^^^^^^ 2025-09-07T07:34:42.4260642Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4260705Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4260746Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4260911Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4260981Z return func(*args, **kwargs) 2025-09-07T07:34:42.4261016Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4261195Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4262209Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4262212Z 2025-09-07T07:34:42.4262453Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4262456Z 2025-09-07T07:34:42.4262457Z 2025-09-07T07:34:42.4262528Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4262713Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.4262718Z 2025-09-07T07:34:42.4262803Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4262877Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4262911Z inline_call [] 2025-09-07T07:34:42.4262964Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4262997Z inductor [] 2025-09-07T07:34:42.4263070Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4263145Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4263406Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4263520Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4263594Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4263747Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4263833Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4263962Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4264080Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4264151Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4264185Z inline_call [] 2025-09-07T07:34:42.4264237Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4265242Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4265313Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4265574Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4265688Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4265738Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4265886Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4265974Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4266103Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4266220Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4266291Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4266346Z inline_call [] 2025-09-07T07:34:42.4266398Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4266467Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4266601Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4266852Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4266997Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4267046Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4267195Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4267277Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4267409Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4267524Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4267739Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-17114a6d624aa143.xml - 2025-09-07T07:34:42.4268741Z =========================== short test summary info ============================ 2025-09-07T07:34:42.4269092Z FAILED [0.7905s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4269176Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4269178Z 2025-09-07T07:34:42.4269410Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4269414Z 2025-09-07T07:34:42.4269416Z 2025-09-07T07:34:42.4269487Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4269670Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.4269673Z 2025-09-07T07:34:42.4269759Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4269818Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.4269883Z ================== 1 failed, 245 deselected, 2 rerun in 3.01s ================== 2025-09-07T07:34:42.4269917Z Got exit code 1 2025-09-07T07:34:42.4269955Z Retrying single test... 2025-09-07T07:34:42.4270378Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.4270420Z import pkg_resources 2025-09-07T07:34:42.4270589Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-f19be8cedd747b33.xml 2025-09-07T07:34:42.4270646Z ============================= test session starts ============================== 2025-09-07T07:34:42.4270758Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.4270797Z cachedir: .pytest_cache 2025-09-07T07:34:42.4270952Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.4270996Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.4271054Z configfile: pytest.ini 2025-09-07T07:34:42.4271217Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.4272255Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.4272477Z stepcurrent: skipping 48 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.4272517Z Running 1 items in this shard 2025-09-07T07:34:42.4272558Z 2025-09-07T07:34:42.4272813Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True [W907 07:23:18.005746080 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4272919Z [W907 07:23:19.639283397 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4273024Z [W907 07:23:19.751024666 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4273073Z ('RERUN', {'yellow': True}) [1.4994s] [100%] 2025-09-07T07:34:42.4273316Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True [W907 07:23:20.955269050 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4273415Z [W907 07:23:20.955749773 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4273463Z ('RERUN', {'yellow': True}) [0.8835s] [100%] 2025-09-07T07:34:42.4273704Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True [W907 07:23:21.655484172 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4273804Z [W907 07:23:21.655933946 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.4273839Z FAILED [0.7455s] [100%] 2025-09-07T07:34:42.4273860Z 2025-09-07T07:34:42.4273909Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.4274006Z _ WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.4274050Z Traceback (most recent call last): 2025-09-07T07:34:42.4274190Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.4274228Z self._run_test( 2025-09-07T07:34:42.4274341Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4274397Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4274436Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4275503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4275549Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4275590Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4275742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4275788Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4275825Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4275961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4276004Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4276043Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4276188Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4276267Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4276305Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4276456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4276568Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4276717Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4276771Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4276810Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4277008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4277059Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4277097Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4277212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4277278Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4278265Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4278395Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4278458Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4278501Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4278641Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4278685Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4278723Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4278861Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4278899Z return aot_autograd( 2025-09-07T07:34:42.4278934Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4279068Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4279163Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4279208Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4279368Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4279450Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4279497Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4279680Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4279722Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4279908Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4279949Z fx_g = _create_graph( 2025-09-07T07:34:42.4279984Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4280183Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4281156Z fx_g = make_fx( 2025-09-07T07:34:42.4281187Z ^^^^^^^^ 2025-09-07T07:34:42.4281339Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4281384Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4281422Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4281567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4281610Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4281644Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4281803Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4281867Z t = dispatch_trace( 2025-09-07T07:34:42.4281901Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4282013Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4282054Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4282088Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4282251Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4282291Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4282326Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4282487Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4282566Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4282607Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4282733Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4283708Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4283743Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4283871Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4283912Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4283946Z ^^^^^^^^^ 2025-09-07T07:34:42.4284080Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4284121Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4284155Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4284303Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4284351Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4284405Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4284562Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4284623Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4284667Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4284843Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4284882Z outs_pair = fn(*args) 2025-09-07T07:34:42.4284917Z ^^^^^^^^^ 2025-09-07T07:34:42.4285088Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4285154Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4285196Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4285371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4285408Z outs_pair = fn(*args) 2025-09-07T07:34:42.4286377Z ^^^^^^^^^ 2025-09-07T07:34:42.4287572Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4287638Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4287681Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4287877Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4287949Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4287993Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4288194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4288232Z outs_pair = fn(*args) 2025-09-07T07:34:42.4288267Z ^^^^^^^^^ 2025-09-07T07:34:42.4288456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4288512Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4288614Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4288785Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4288830Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4288868Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4288994Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4289041Z return handle_torch_function( 2025-09-07T07:34:42.4289076Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4290513Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4290593Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4290643Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4290815Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4290857Z return func(*args, **kwargs) 2025-09-07T07:34:42.4290893Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4291018Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4291060Z result = _engine_run_backward( 2025-09-07T07:34:42.4291130Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4291284Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4291405Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4291454Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4291581Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4291624Z return user_fn(self, *args) 2025-09-07T07:34:42.4291659Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4291803Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4291845Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4291882Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4292040Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4292087Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4292123Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4292249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4293307Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4293396Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4293564Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4293617Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4293656Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4293792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4293843Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4293901Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4294062Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4294111Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4294149Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4294309Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4294387Z t = dispatch_trace( 2025-09-07T07:34:42.4294422Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4294535Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4294577Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4294613Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4294738Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4294779Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4294813Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4294973Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4295052Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4296065Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4296191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4296230Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4296263Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4296389Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4296429Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4296581Z ^^^^^^^^^ 2025-09-07T07:34:42.4296732Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4296781Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4296814Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4296856Z File "", line 1, in 2025-09-07T07:34:42.4296999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4297078Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4297122Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4297259Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4297305Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4297343Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4297539Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4297582Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4297618Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4297788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4298838Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4298879Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4299022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4299064Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4299100Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4299232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4299346Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4299392Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4299515Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4299575Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4299619Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4299765Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4299804Z leaves = list(leaves) 2025-09-07T07:34:42.4299838Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4299963Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4299998Z return func(x) 2025-09-07T07:34:42.4300030Z ^^^^^^^ 2025-09-07T07:34:42.4300169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4300234Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4300275Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4300442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4300484Z return func(*args, **kwargs) 2025-09-07T07:34:42.4301481Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4301662Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4301748Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4301751Z 2025-09-07T07:34:42.4301956Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4301986Z 2025-09-07T07:34:42.4301988Z 2025-09-07T07:34:42.4302061Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4302245Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.4302247Z 2025-09-07T07:34:42.4302335Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4302410Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4302445Z inline_call [] 2025-09-07T07:34:42.4302498Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4302532Z inductor [] 2025-09-07T07:34:42.4302605Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4302677Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4302937Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4303052Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4303102Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4303270Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4303357Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4303489Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4303609Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4303708Z _ WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.4304737Z Traceback (most recent call last): 2025-09-07T07:34:42.4304875Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.4304909Z self._run_test( 2025-09-07T07:34:42.4305021Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4305076Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4305116Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4305264Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4305310Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4305349Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4305499Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4305550Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4305588Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4305724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4305767Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4305804Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4305948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4306029Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4306067Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4306218Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4306263Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4306413Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4306558Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4307566Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4307711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4307764Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4307802Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4307920Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4307984Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4308029Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4308154Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4308220Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4308260Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4308400Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4308443Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4308480Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4308645Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4308684Z return aot_autograd( 2025-09-07T07:34:42.4308721Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4308856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4308925Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4308969Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4309150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4309233Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4310239Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4310423Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4310486Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4310672Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4310711Z fx_g = _create_graph( 2025-09-07T07:34:42.4310746Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4310911Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4310948Z fx_g = make_fx( 2025-09-07T07:34:42.4310981Z ^^^^^^^^ 2025-09-07T07:34:42.4311132Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4311178Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4311215Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4311363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4311405Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4311441Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4311599Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4311637Z t = dispatch_trace( 2025-09-07T07:34:42.4311670Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4311805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4311848Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4311883Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4312008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4313005Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4313041Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4313204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4313284Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4313324Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4313449Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4313490Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4313525Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4313651Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4313691Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4313724Z ^^^^^^^^^ 2025-09-07T07:34:42.4313876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4313917Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4313953Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4314100Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4314150Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4314183Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4314340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4314417Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4314461Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4314636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4315622Z outs_pair = fn(*args) 2025-09-07T07:34:42.4315658Z ^^^^^^^^^ 2025-09-07T07:34:42.4315846Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4315912Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4315956Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4316130Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4316171Z outs_pair = fn(*args) 2025-09-07T07:34:42.4316204Z ^^^^^^^^^ 2025-09-07T07:34:42.4316382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4316440Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4316547Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4316745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4316815Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4316860Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4317032Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4317093Z outs_pair = fn(*args) 2025-09-07T07:34:42.4317129Z ^^^^^^^^^ 2025-09-07T07:34:42.4317318Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4317363Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4317399Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4317567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4317614Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4318611Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4318738Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4318779Z return handle_torch_function( 2025-09-07T07:34:42.4318816Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4318958Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4319034Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4319078Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4319244Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4319306Z return func(*args, **kwargs) 2025-09-07T07:34:42.4319342Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4319466Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4319509Z result = _engine_run_backward( 2025-09-07T07:34:42.4319544Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4319690Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4319815Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4319884Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4320009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4320051Z return user_fn(self, *args) 2025-09-07T07:34:42.4320087Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4320301Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4320346Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4321345Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4321503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4321547Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4321586Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4321709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4321748Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4321782Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4321950Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4322000Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4322041Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4322177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4322226Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4322263Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4322425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4322495Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4322534Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4322692Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4322730Z t = dispatch_trace( 2025-09-07T07:34:42.4322764Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4322878Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4322919Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4322955Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4324026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4324066Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4324099Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4324263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4324341Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4324382Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4324505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4324567Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4324602Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4324729Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4324771Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4324804Z ^^^^^^^^^ 2025-09-07T07:34:42.4324954Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4325004Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4325053Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4325094Z File "", line 1, in 2025-09-07T07:34:42.4325237Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4325313Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4325359Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4325511Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4325559Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4325595Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4326801Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4326846Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4326884Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4327056Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4327100Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4327136Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4327280Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4327323Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4327358Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4327491Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4327581Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4327626Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4327781Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4327842Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4327885Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4328011Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4328050Z leaves = list(leaves) 2025-09-07T07:34:42.4328084Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4328207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4328242Z return func(x) 2025-09-07T07:34:42.4328274Z ^^^^^^^ 2025-09-07T07:34:42.4329374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4329439Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4329483Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4329648Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4329689Z return func(*args, **kwargs) 2025-09-07T07:34:42.4329723Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4329930Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4330016Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4330019Z 2025-09-07T07:34:42.4330226Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4330228Z 2025-09-07T07:34:42.4330230Z 2025-09-07T07:34:42.4330302Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4330508Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.4330510Z 2025-09-07T07:34:42.4330595Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4330669Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4330704Z inline_call [] 2025-09-07T07:34:42.4330757Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4330809Z inductor [] 2025-09-07T07:34:42.4330884Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4330954Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4331216Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4331334Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4331384Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4331534Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4332582Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4332716Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4332836Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4332908Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4332942Z inline_call [] 2025-09-07T07:34:42.4332995Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4333087Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4333158Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4333411Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4333523Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4333575Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4333723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4333808Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4333935Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4334054Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4334104Z =================================== FAILURES =================================== 2025-09-07T07:34:42.4334202Z _ WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.4334244Z Traceback (most recent call last): 2025-09-07T07:34:42.4334395Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1506, in test_while_loop_with_conv 2025-09-07T07:34:42.4334430Z self._run_test( 2025-09-07T07:34:42.4334542Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4335549Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4335591Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4335723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4335772Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4335827Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4335979Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4336025Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4336063Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4336201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4336258Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4336296Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4336438Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4336566Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4336604Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4336759Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4336804Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4336956Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4337008Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4337049Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4337191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4337242Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4337279Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4338365Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4338462Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4338509Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4338635Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4338698Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4338739Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4338880Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4338923Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4338960Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4339096Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4339135Z return aot_autograd( 2025-09-07T07:34:42.4339170Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4339308Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4339378Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4339422Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4339584Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4339686Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4339732Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4339913Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4339956Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4340141Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4341155Z fx_g = _create_graph( 2025-09-07T07:34:42.4341190Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4341358Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4341391Z fx_g = make_fx( 2025-09-07T07:34:42.4341423Z ^^^^^^^^ 2025-09-07T07:34:42.4341594Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4341641Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4341678Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4341824Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4341865Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4341902Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4342066Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4342104Z t = dispatch_trace( 2025-09-07T07:34:42.4342137Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4342250Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4342290Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4342328Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4342452Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4342491Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4342526Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4342686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4342764Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4343789Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4343915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4343952Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4343987Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4344114Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4344155Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4344190Z ^^^^^^^^^ 2025-09-07T07:34:42.4344323Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4344362Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4344397Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4344545Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4344597Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4344631Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4344786Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4344847Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4344910Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4345086Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4345126Z outs_pair = fn(*args) 2025-09-07T07:34:42.4345160Z ^^^^^^^^^ 2025-09-07T07:34:42.4345332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4345396Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4346407Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4346646Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4346685Z outs_pair = fn(*args) 2025-09-07T07:34:42.4346719Z ^^^^^^^^^ 2025-09-07T07:34:42.4346895Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4346977Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4347020Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4347214Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4347284Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4347329Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4347503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4347542Z outs_pair = fn(*args) 2025-09-07T07:34:42.4347575Z ^^^^^^^^^ 2025-09-07T07:34:42.4347766Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4347811Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4347848Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4348016Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4348062Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4348098Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4348248Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4348290Z return handle_torch_function( 2025-09-07T07:34:42.4348327Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4349435Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4349511Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4349556Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4349725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4349765Z return func(*args, **kwargs) 2025-09-07T07:34:42.4349800Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4349923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4349967Z result = _engine_run_backward( 2025-09-07T07:34:42.4350002Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4350148Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4350268Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4350341Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4350470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4350511Z return user_fn(self, *args) 2025-09-07T07:34:42.4350546Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4350690Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4350733Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4350769Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4350948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4350991Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4351028Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4351151Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4352149Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4352199Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4352366Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4352417Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4352458Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4352592Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4352645Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4352683Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4352845Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4352891Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4352932Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4353090Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4353128Z t = dispatch_trace( 2025-09-07T07:34:42.4353162Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4353275Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4353317Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4353353Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4353494Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4353534Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4353567Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4353726Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4354762Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4354806Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4354930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4354969Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4355002Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4355127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4355171Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4355205Z ^^^^^^^^^ 2025-09-07T07:34:42.4355354Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4355402Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4355435Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4355477Z File "", line 1, in 2025-09-07T07:34:42.4355639Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4355718Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4355763Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4355899Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4355946Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4356002Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4356196Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4356239Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4356274Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4356446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4357555Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4357594Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4357739Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4357780Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4357816Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4357949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4358040Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4358084Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4358209Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4358270Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4358314Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4358441Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4358478Z leaves = list(leaves) 2025-09-07T07:34:42.4358513Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4358635Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4358690Z return func(x) 2025-09-07T07:34:42.4358724Z ^^^^^^^ 2025-09-07T07:34:42.4358861Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4358925Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4358966Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4359133Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4360131Z return func(*args, **kwargs) 2025-09-07T07:34:42.4360221Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4360403Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4360489Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4360491Z 2025-09-07T07:34:42.4360698Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4360705Z 2025-09-07T07:34:42.4360707Z 2025-09-07T07:34:42.4360779Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4360986Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.4360989Z 2025-09-07T07:34:42.4361077Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4361151Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4361185Z inline_call [] 2025-09-07T07:34:42.4361239Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4361273Z inductor [] 2025-09-07T07:34:42.4361347Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4361421Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4361696Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4361811Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4361863Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4362034Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4362121Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4362252Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4362372Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4362446Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4363451Z inline_call [] 2025-09-07T07:34:42.4363505Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4363577Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4363647Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4363904Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4364016Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4364065Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4364214Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4364321Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4364449Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4364568Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4364639Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4364673Z inline_call [] 2025-09-07T07:34:42.4364726Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4364798Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4364867Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4365122Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4365234Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1089, in forward 2025-09-07T07:34:42.4365283Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4365431Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4365528Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4366681Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4366801Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4367021Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-f19be8cedd747b33.xml - 2025-09-07T07:34:42.4367079Z =========================== short test summary info ============================ 2025-09-07T07:34:42.4367458Z FAILED [0.7455s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4367542Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4367545Z 2025-09-07T07:34:42.4367769Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4367772Z 2025-09-07T07:34:42.4367774Z 2025-09-07T07:34:42.4367846Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4368029Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.4368032Z 2025-09-07T07:34:42.4368116Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4368176Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.4368242Z ================== 1 failed, 245 deselected, 2 rerun in 3.32s ================== 2025-09-07T07:34:42.4368277Z Got exit code 1 2025-09-07T07:34:42.4368402Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-09-07T07:34:42.4368828Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.4368868Z import pkg_resources 2025-09-07T07:34:42.4369039Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-b4fb1e13ef58b0ad.xml 2025-09-07T07:34:42.4369115Z ============================= test session starts ============================== 2025-09-07T07:34:42.4369229Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.4369269Z cachedir: .pytest_cache 2025-09-07T07:34:42.4369425Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.4370444Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.4370485Z configfile: pytest.ini 2025-09-07T07:34:42.4370647Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.4370722Z collecting ... collected 467 items / 49 deselected / 418 selected 2025-09-07T07:34:42.4370772Z stepcurrent: skipping 49 already run items. 2025-09-07T07:34:42.4370813Z Running 197 items in this shard 2025-09-07T07:34:42.4370819Z 2025-09-07T07:34:42.4371009Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cpu_dynamic_False_autograd_False PASSED [1.5483s] [ 0%] 2025-09-07T07:34:42.4371186Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cpu_dynamic_False_autograd_True PASSED [0.6735s] [ 1%] 2025-09-07T07:34:42.4371383Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cpu_dynamic_True_autograd_False PASSED [1.8192s] [ 1%] 2025-09-07T07:34:42.4371562Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cuda_dynamic_True_autograd_False PASSED [1.9455s] [ 2%] 2025-09-07T07:34:42.4371738Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cpu_dynamic_False_autograd_False PASSED [0.6309s] [ 2%] 2025-09-07T07:34:42.4371911Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cpu_dynamic_True_autograd_False PASSED [1.4543s] [ 3%] 2025-09-07T07:34:42.4372107Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cpu_dynamic_True_autograd_True PASSED [1.4473s] [ 3%] 2025-09-07T07:34:42.4372287Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cuda_dynamic_False_autograd_False PASSED [0.8343s] [ 4%] 2025-09-07T07:34:42.4372478Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cuda_dynamic_True_autograd_True PASSED [1.6547s] [ 4%] 2025-09-07T07:34:42.4372649Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_buffers_device_cuda_dynamic_False_autograd_False PASSED [1.0629s] [ 5%] 2025-09-07T07:34:42.4372814Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_False PASSED [0.5561s] [ 5%] 2025-09-07T07:34:42.4373003Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.2425s] [ 6%] 2025-09-07T07:34:42.4373193Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.2166s] [ 6%] 2025-09-07T07:34:42.4373356Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True FAILED [0.2322s] [ 6%] 2025-09-07T07:34:42.4373360Z 2025-09-07T07:34:42.4373409Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.4374476Z _ WhileLoopTests.test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4374521Z Traceback (most recent call last): 2025-09-07T07:34:42.4374670Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1267, in test_while_loop_with_outer_code 2025-09-07T07:34:42.4374705Z self._run_test( 2025-09-07T07:34:42.4374838Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4374897Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4374937Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4375072Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4375118Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4375160Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4375315Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4375362Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4375399Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4375536Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4375581Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4375619Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4375762Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4375844Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4375882Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4376051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4376097Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4376246Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4377324Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4377366Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4377509Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4377591Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4377629Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4377745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4377811Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4377857Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4378004Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4378068Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4378109Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4378250Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4378298Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4378334Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4378474Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4378514Z return aot_autograd( 2025-09-07T07:34:42.4378549Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4378686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4378756Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4378802Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4378961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4379044Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4380077Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4380262Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4380306Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4380492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4380532Z fx_g = _create_graph( 2025-09-07T07:34:42.4380569Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4380733Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4380767Z fx_g = make_fx( 2025-09-07T07:34:42.4380799Z ^^^^^^^^ 2025-09-07T07:34:42.4380951Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4381001Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4381038Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4381186Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4381229Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4381267Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4381448Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4381487Z t = dispatch_trace( 2025-09-07T07:34:42.4381521Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4381633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4381675Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4381710Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4382791Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4382858Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4382894Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4383055Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4383134Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4383175Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4383317Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4383355Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4383390Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4383515Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4383556Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4383590Z ^^^^^^^^^ 2025-09-07T07:34:42.4383725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4383765Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4383800Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4383949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4384000Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4384033Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4384190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4384252Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4384295Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4384470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4385484Z outs_pair = fn(*args) 2025-09-07T07:34:42.4385519Z ^^^^^^^^^ 2025-09-07T07:34:42.4385692Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4385758Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4385803Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4385978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4386016Z outs_pair = fn(*args) 2025-09-07T07:34:42.4386050Z ^^^^^^^^^ 2025-09-07T07:34:42.4386226Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4386289Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4386333Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4386608Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4386678Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4386752Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4386925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4386965Z outs_pair = fn(*args) 2025-09-07T07:34:42.4386998Z ^^^^^^^^^ 2025-09-07T07:34:42.4387189Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4387235Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4387292Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4387462Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4388514Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4388551Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4388678Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4388742Z return handle_torch_function( 2025-09-07T07:34:42.4388778Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4388919Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4388994Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4389038Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4389206Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4389248Z return func(*args, **kwargs) 2025-09-07T07:34:42.4389284Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4389408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4389450Z result = _engine_run_backward( 2025-09-07T07:34:42.4389486Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4389633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4389754Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4389803Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4389930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4389992Z return user_fn(self, *args) 2025-09-07T07:34:42.4390028Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4390172Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4390215Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4391212Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4391374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4391417Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4391453Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4391578Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4391617Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4391652Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4391821Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4391872Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4391912Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4392048Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4392113Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4392153Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4392315Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4392361Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4392401Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4392559Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4392618Z t = dispatch_trace( 2025-09-07T07:34:42.4392651Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4392764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4392805Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4393795Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4393920Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4393977Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4394012Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4394172Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4394250Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4394290Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4394418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4394455Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4394490Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4394615Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4394656Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4394691Z ^^^^^^^^^ 2025-09-07T07:34:42.4394841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4394889Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4394923Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4394963Z File "", line 1, in 2025-09-07T07:34:42.4395107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4395203Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4395248Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4395383Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4395430Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4396420Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4396675Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4396718Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4396753Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4396923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4396970Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4397008Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4397152Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4397193Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4397229Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4397387Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4397478Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4397524Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4397650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4397710Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4397754Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4397905Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4397943Z leaves = list(leaves) 2025-09-07T07:34:42.4397978Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4398100Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4398135Z return func(x) 2025-09-07T07:34:42.4398168Z ^^^^^^^ 2025-09-07T07:34:42.4399293Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4399359Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4399400Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4399566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4399608Z return func(*args, **kwargs) 2025-09-07T07:34:42.4399646Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4399829Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4399914Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4399917Z 2025-09-07T07:34:42.4400127Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4400130Z 2025-09-07T07:34:42.4400132Z 2025-09-07T07:34:42.4400273Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4400465Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.4400467Z 2025-09-07T07:34:42.4400553Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4400651Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4400685Z inline_call [] 2025-09-07T07:34:42.4400743Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4400816Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4400889Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4401150Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4401262Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4401339Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4401490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4402557Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4402689Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4402808Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4402938Z _ WhileLoopTests.test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4402983Z Traceback (most recent call last): 2025-09-07T07:34:42.4403130Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1267, in test_while_loop_with_outer_code 2025-09-07T07:34:42.4403165Z self._run_test( 2025-09-07T07:34:42.4403278Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4403332Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4403374Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4403522Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4403569Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4403607Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4403760Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4403805Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4403858Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4403995Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4404038Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4404075Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4404217Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4404299Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4404337Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4405449Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4405494Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4405646Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4405698Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4405739Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4405880Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4405931Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4405988Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4406106Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4406171Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4406216Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4406343Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4406407Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4406448Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4406666Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4406710Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4406747Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4406884Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4406927Z return aot_autograd( 2025-09-07T07:34:42.4406961Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4407098Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4408135Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4408205Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4408368Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4408451Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4408495Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4408676Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4408739Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4408925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4408963Z fx_g = _create_graph( 2025-09-07T07:34:42.4408997Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4409161Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4409214Z fx_g = make_fx( 2025-09-07T07:34:42.4409247Z ^^^^^^^^ 2025-09-07T07:34:42.4409399Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4409446Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4409483Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4409632Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4409677Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4409713Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4409872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4409910Z t = dispatch_trace( 2025-09-07T07:34:42.4409944Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4411019Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4411062Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4411098Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4411222Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4411262Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4411296Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4411484Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4411563Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4411603Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4411727Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4411767Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4411802Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4411926Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4411968Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4412001Z ^^^^^^^^^ 2025-09-07T07:34:42.4412134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4412176Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4412214Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4412361Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4412410Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4412443Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4413572Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4413636Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4413681Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4413858Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4413897Z outs_pair = fn(*args) 2025-09-07T07:34:42.4413931Z ^^^^^^^^^ 2025-09-07T07:34:42.4414105Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4414189Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4414233Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4414406Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4414445Z outs_pair = fn(*args) 2025-09-07T07:34:42.4414492Z ^^^^^^^^^ 2025-09-07T07:34:42.4414671Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4414730Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4414773Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4414967Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4415040Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4415084Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4415257Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4415295Z outs_pair = fn(*args) 2025-09-07T07:34:42.4415329Z ^^^^^^^^^ 2025-09-07T07:34:42.4415518Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4416583Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4416620Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4416792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4416872Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4416908Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4417033Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4417075Z return handle_torch_function( 2025-09-07T07:34:42.4417111Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4417254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4417328Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4417372Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4417539Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4417581Z return func(*args, **kwargs) 2025-09-07T07:34:42.4417619Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4417741Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4417783Z result = _engine_run_backward( 2025-09-07T07:34:42.4417817Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4417985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4418107Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4418157Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4418283Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4419293Z return user_fn(self, *args) 2025-09-07T07:34:42.4419330Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4419502Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4419544Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4419580Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4419738Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4419783Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4419820Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4419963Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4420003Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4420037Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4420202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4420256Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4420297Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4420433Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4420482Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4420520Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4420685Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4420732Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4420771Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4420928Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4420966Z t = dispatch_trace( 2025-09-07T07:34:42.4421958Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4422123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4422168Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4422204Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4422328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4422366Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4422401Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4422563Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4422641Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4422681Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4422806Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4422847Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4422883Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4423010Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4423050Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4423083Z ^^^^^^^^^ 2025-09-07T07:34:42.4423248Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4423297Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4423332Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4423373Z File "", line 1, in 2025-09-07T07:34:42.4423517Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4423593Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4424595Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4424753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4424801Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4424838Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4425033Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4425075Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4425127Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4425298Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4425342Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4425378Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4425520Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4425566Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4425602Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4425738Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4425826Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4425873Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4425999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4426059Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4426102Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4426228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4426287Z leaves = list(leaves) 2025-09-07T07:34:42.4427343Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4427468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4427502Z return func(x) 2025-09-07T07:34:42.4427534Z ^^^^^^^ 2025-09-07T07:34:42.4427672Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4427737Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4427779Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4427945Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4427986Z return func(*args, **kwargs) 2025-09-07T07:34:42.4428020Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4428199Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4428287Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4428289Z 2025-09-07T07:34:42.4428496Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4428498Z 2025-09-07T07:34:42.4428500Z 2025-09-07T07:34:42.4428596Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4428792Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.4428794Z 2025-09-07T07:34:42.4428879Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4428954Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4428988Z inline_call [] 2025-09-07T07:34:42.4429069Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4429140Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4429212Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4429470Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4430563Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4430640Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4430792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4430877Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4431009Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4431130Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4431201Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4431235Z inline_call [] 2025-09-07T07:34:42.4431293Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4431365Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4431436Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4431689Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4431799Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4431899Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4432050Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4432134Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4432264Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4432383Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4432433Z =================================== FAILURES =================================== 2025-09-07T07:34:42.4432536Z _ WhileLoopTests.test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4432580Z Traceback (most recent call last): 2025-09-07T07:34:42.4433686Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1267, in test_while_loop_with_outer_code 2025-09-07T07:34:42.4433728Z self._run_test( 2025-09-07T07:34:42.4433840Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4433895Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4433935Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4434087Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4434136Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4434175Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4434324Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4434371Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4434408Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4434547Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4434607Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4434645Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4434786Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4434866Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4434905Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4435075Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4435122Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4435271Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4435325Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4435367Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4436461Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4436572Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4436612Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4436729Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4436797Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4436840Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4436966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4437028Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4437071Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4437237Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4437282Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4437319Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4437456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4437495Z return aot_autograd( 2025-09-07T07:34:42.4437531Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4437668Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4437738Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4437783Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4437944Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4438029Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4438073Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4439226Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4439271Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4439479Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4439520Z fx_g = _create_graph( 2025-09-07T07:34:42.4439554Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4439717Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4439750Z fx_g = make_fx( 2025-09-07T07:34:42.4439783Z ^^^^^^^^ 2025-09-07T07:34:42.4439936Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4440003Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4440041Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4440246Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4440291Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4440326Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4440514Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4440552Z t = dispatch_trace( 2025-09-07T07:34:42.4440587Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4440699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4440740Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4440781Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4440906Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4440945Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4441950Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4442114Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4442195Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4442236Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4442360Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4442398Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4442433Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4442558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4442629Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4442663Z ^^^^^^^^^ 2025-09-07T07:34:42.4442794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4442835Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4442869Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4443019Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4443068Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4443103Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4443258Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4443320Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4443362Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4443540Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4443579Z outs_pair = fn(*args) 2025-09-07T07:34:42.4444569Z ^^^^^^^^^ 2025-09-07T07:34:42.4444741Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4444827Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4444872Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4445047Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4445085Z outs_pair = fn(*args) 2025-09-07T07:34:42.4445119Z ^^^^^^^^^ 2025-09-07T07:34:42.4445295Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4445374Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4445416Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4445610Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4445680Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4445742Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4445917Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4445956Z outs_pair = fn(*args) 2025-09-07T07:34:42.4445989Z ^^^^^^^^^ 2025-09-07T07:34:42.4446179Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4446225Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4446262Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4446429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4446475Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4446588Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4447686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4447730Z return handle_torch_function( 2025-09-07T07:34:42.4447766Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4447906Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4447980Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4448056Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4448224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4448265Z return func(*args, **kwargs) 2025-09-07T07:34:42.4448300Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4448426Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4448467Z result = _engine_run_backward( 2025-09-07T07:34:42.4448504Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4448649Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4448770Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4448818Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4448949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4448990Z return user_fn(self, *args) 2025-09-07T07:34:42.4449026Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4449169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4449231Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4449268Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4450387Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4450432Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4450468Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4450589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4450660Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4450695Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4450861Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4450911Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4450951Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4451108Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4451158Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4451196Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4451357Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4451404Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4451445Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4451605Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4451642Z t = dispatch_trace( 2025-09-07T07:34:42.4451676Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4451788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4451832Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4451868Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4451991Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4452988Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4453024Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4453186Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4453287Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4453329Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4453454Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4453491Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4453526Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4453653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4453695Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4453728Z ^^^^^^^^^ 2025-09-07T07:34:42.4453877Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4453925Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4453957Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4453998Z File "", line 1, in 2025-09-07T07:34:42.4454148Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4454227Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4454271Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4454407Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4454469Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4454509Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4454700Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4455704Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4455739Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4455911Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4455975Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4456012Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4456153Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4456196Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4456232Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4456382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4456470Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4456593Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4456719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4456781Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4456825Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4456951Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4456989Z leaves = list(leaves) 2025-09-07T07:34:42.4457023Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4457148Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4457183Z return func(x) 2025-09-07T07:34:42.4457217Z ^^^^^^^ 2025-09-07T07:34:42.4457355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4458388Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4458430Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4458598Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4458667Z return func(*args, **kwargs) 2025-09-07T07:34:42.4458703Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4458883Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4458968Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4458971Z 2025-09-07T07:34:42.4459179Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4459183Z 2025-09-07T07:34:42.4459185Z 2025-09-07T07:34:42.4459256Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4459447Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.4459453Z 2025-09-07T07:34:42.4459538Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4459611Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4459645Z inline_call [] 2025-09-07T07:34:42.4459701Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4459794Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4459868Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4460126Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4460237Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4460313Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4460495Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4460581Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4461694Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4461816Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4461908Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4461943Z inline_call [] 2025-09-07T07:34:42.4461999Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4462071Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4462140Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4462397Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4462509Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4462584Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4462735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4462821Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4462950Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4463068Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4463137Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4463190Z inline_call [] 2025-09-07T07:34:42.4463244Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4463315Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4463384Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4463640Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4463748Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4463822Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4464931Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4465018Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4465148Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4465266Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4465499Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-b4fb1e13ef58b0ad.xml - 2025-09-07T07:34:42.4465559Z =========================== short test summary info ============================ 2025-09-07T07:34:42.4465916Z FAILED [0.2322s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4466000Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4466020Z 2025-09-07T07:34:42.4466226Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4466230Z 2025-09-07T07:34:42.4466231Z 2025-09-07T07:34:42.4466302Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4466559Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.4466582Z 2025-09-07T07:34:42.4466668Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4466728Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.4466797Z ============ 1 failed, 11 passed, 49 deselected, 2 rerun in 14.55s ============= 2025-09-07T07:34:42.4466832Z Got exit code 1 2025-09-07T07:34:42.4466872Z Retrying single test... 2025-09-07T07:34:42.4467298Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.4467338Z import pkg_resources 2025-09-07T07:34:42.4467509Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-aa44cd38caeaebd1.xml 2025-09-07T07:34:42.4467566Z ============================= test session starts ============================== 2025-09-07T07:34:42.4467679Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.4468703Z cachedir: .pytest_cache 2025-09-07T07:34:42.4468860Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.4468933Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.4468970Z configfile: pytest.ini 2025-09-07T07:34:42.4469132Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.4469207Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.4469438Z stepcurrent: skipping 60 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.4469480Z Running 1 items in this shard 2025-09-07T07:34:42.4469483Z 2025-09-07T07:34:42.4469676Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.4344s] [100%] 2025-09-07T07:34:42.4469868Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.2369s] [100%] 2025-09-07T07:34:42.4470037Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True FAILED [0.2232s] [100%] 2025-09-07T07:34:42.4470039Z 2025-09-07T07:34:42.4470089Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.4470193Z _ WhileLoopTests.test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4470255Z Traceback (most recent call last): 2025-09-07T07:34:42.4470407Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1267, in test_while_loop_with_outer_code 2025-09-07T07:34:42.4470442Z self._run_test( 2025-09-07T07:34:42.4470554Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4470609Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4470651Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4470831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4470877Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4470916Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4472037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4472084Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4472123Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4472280Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4472324Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4472361Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4472504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4472588Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4472629Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4472780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4472827Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4472980Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4473035Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4473075Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4473217Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4473268Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4473307Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4473442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4473509Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4473552Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4473680Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4474701Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4474746Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4474886Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4474931Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4474967Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4475106Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4475148Z return aot_autograd( 2025-09-07T07:34:42.4475182Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4475318Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4475387Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4475431Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4475612Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4475696Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4475740Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4475924Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4475968Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4476173Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4476213Z fx_g = _create_graph( 2025-09-07T07:34:42.4476248Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4476411Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4476446Z fx_g = make_fx( 2025-09-07T07:34:42.4476569Z ^^^^^^^^ 2025-09-07T07:34:42.4477694Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4477740Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4477778Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4477925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4477973Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4478008Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4478168Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4478205Z t = dispatch_trace( 2025-09-07T07:34:42.4478239Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4478353Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4478396Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4478431Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4478556Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4478596Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4478630Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4478793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4478902Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4478944Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4479070Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4479110Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4479144Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4480286Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4480329Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4480364Z ^^^^^^^^^ 2025-09-07T07:34:42.4480497Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4480538Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4480576Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4480726Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4480776Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4480809Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4480964Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4481058Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4481103Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4481279Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4481317Z outs_pair = fn(*args) 2025-09-07T07:34:42.4481352Z ^^^^^^^^^ 2025-09-07T07:34:42.4481525Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4481615Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4481660Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4481832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4481873Z outs_pair = fn(*args) 2025-09-07T07:34:42.4481906Z ^^^^^^^^^ 2025-09-07T07:34:42.4482104Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4483127Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4483170Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4483365Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4483439Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4483483Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4483655Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4483693Z outs_pair = fn(*args) 2025-09-07T07:34:42.4483729Z ^^^^^^^^^ 2025-09-07T07:34:42.4483920Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4483966Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4484001Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4484169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4484234Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4484272Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4484396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4484439Z return handle_torch_function( 2025-09-07T07:34:42.4484474Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4484616Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4484691Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4484736Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4484903Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4485899Z return func(*args, **kwargs) 2025-09-07T07:34:42.4485935Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4486063Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4486104Z result = _engine_run_backward( 2025-09-07T07:34:42.4486139Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4486286Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4486425Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4486475Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4486673Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4486713Z return user_fn(self, *args) 2025-09-07T07:34:42.4486750Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4486893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4486966Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4487002Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4487160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4487204Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4487240Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4487382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4487422Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4487457Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4487621Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4487672Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4488699Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4488840Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4488889Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4488928Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4489090Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4489140Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4489178Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4489340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4489377Z t = dispatch_trace( 2025-09-07T07:34:42.4489411Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4489523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4489592Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4489628Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4489751Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4489790Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4489824Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4489986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4490064Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4490105Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4490228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4490266Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4490300Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4491390Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4491431Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4491466Z ^^^^^^^^^ 2025-09-07T07:34:42.4491615Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4491687Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4491721Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4491765Z File "", line 1, in 2025-09-07T07:34:42.4491907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4491984Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4492028Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4492166Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4492232Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4492270Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4492464Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4492507Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4492543Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4492731Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4492775Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4492811Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4492954Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4492996Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4493991Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4494126Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4494213Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4494258Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4494384Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4494443Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4494487Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4494613Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4494651Z leaves = list(leaves) 2025-09-07T07:34:42.4494684Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4494832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4494866Z return func(x) 2025-09-07T07:34:42.4494899Z ^^^^^^^ 2025-09-07T07:34:42.4495037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4495102Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4495144Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4495314Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4495353Z return func(*args, **kwargs) 2025-09-07T07:34:42.4495389Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4495568Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4495654Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4495658Z 2025-09-07T07:34:42.4495865Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4495867Z 2025-09-07T07:34:42.4495869Z 2025-09-07T07:34:42.4496964Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4497183Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.4497186Z 2025-09-07T07:34:42.4497273Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4497347Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4497382Z inline_call [] 2025-09-07T07:34:42.4497437Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4497494Z inductor [] 2025-09-07T07:34:42.4497567Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4497638Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4497898Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4498030Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4498107Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4498259Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4498345Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4498477Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4498598Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4498704Z _ WhileLoopTests.test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4498747Z Traceback (most recent call last): 2025-09-07T07:34:42.4498896Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1267, in test_while_loop_with_outer_code 2025-09-07T07:34:42.4498931Z self._run_test( 2025-09-07T07:34:42.4499045Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4500074Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4500115Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4500247Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4500325Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4500366Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4500516Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4500562Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4500600Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4500736Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4500780Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4500818Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4500961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4501041Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4501079Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4501234Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4501278Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4501431Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4501483Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4501540Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4501684Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4501735Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4501772Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4502850Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4502918Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4502982Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4503108Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4503171Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4503212Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4503375Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4503419Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4503456Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4503592Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4503632Z return aot_autograd( 2025-09-07T07:34:42.4503666Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4503802Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4503870Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4503915Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4504075Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4504159Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4504206Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4504390Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4504432Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4504619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4505636Z fx_g = _create_graph( 2025-09-07T07:34:42.4505672Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4505835Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4505869Z fx_g = make_fx( 2025-09-07T07:34:42.4505901Z ^^^^^^^^ 2025-09-07T07:34:42.4506054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4506101Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4506138Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4506283Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4506325Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4506361Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4506592Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4506630Z t = dispatch_trace( 2025-09-07T07:34:42.4506663Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4506778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4506818Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4506879Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4507006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4507045Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4507080Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4507243Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4507320Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4508352Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4508480Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4508518Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4508552Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4508680Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4508722Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4508781Z ^^^^^^^^^ 2025-09-07T07:34:42.4508915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4508955Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4508990Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4509139Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4509192Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4509225Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4509383Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4509445Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4509491Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4509668Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4509707Z outs_pair = fn(*args) 2025-09-07T07:34:42.4509741Z ^^^^^^^^^ 2025-09-07T07:34:42.4509913Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4509979Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4511006Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4511180Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4511218Z outs_pair = fn(*args) 2025-09-07T07:34:42.4511251Z ^^^^^^^^^ 2025-09-07T07:34:42.4511430Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4511491Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4511534Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4511729Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4511798Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4511847Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4512019Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4512056Z outs_pair = fn(*args) 2025-09-07T07:34:42.4512090Z ^^^^^^^^^ 2025-09-07T07:34:42.4512296Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4512342Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4512380Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4512549Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4512595Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4512631Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4512757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4512818Z return handle_torch_function( 2025-09-07T07:34:42.4512855Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4513947Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4514024Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4514070Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4514253Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4514294Z return func(*args, **kwargs) 2025-09-07T07:34:42.4514330Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4514453Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4514495Z result = _engine_run_backward( 2025-09-07T07:34:42.4514533Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4514680Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4514800Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4514850Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4514976Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4515017Z return user_fn(self, *args) 2025-09-07T07:34:42.4515052Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4515197Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4515239Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4515295Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4515453Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4515496Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4515531Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4516672Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4516712Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4516748Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4516914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4516966Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4517005Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4517140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4517192Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4517230Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4517392Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4517438Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4517504Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4517666Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4517705Z t = dispatch_trace( 2025-09-07T07:34:42.4517738Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4517852Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4517894Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4517930Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4518078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4518117Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4518151Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4518312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4519362Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4519432Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4519558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4519597Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4519631Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4519757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4519799Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4519835Z ^^^^^^^^^ 2025-09-07T07:34:42.4519985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4520034Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4520067Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4520108Z File "", line 1, in 2025-09-07T07:34:42.4520300Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4520378Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4520423Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4520562Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4520609Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4520669Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4520865Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4520907Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4520943Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4521114Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4522129Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4522167Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4522311Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4522353Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4522389Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4522523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4522612Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4522656Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4522781Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4522859Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4522905Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4523031Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4523069Z leaves = list(leaves) 2025-09-07T07:34:42.4523103Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4523225Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4523261Z return func(x) 2025-09-07T07:34:42.4523307Z ^^^^^^^ 2025-09-07T07:34:42.4523446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4523509Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4523551Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4523719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4524732Z return func(*args, **kwargs) 2025-09-07T07:34:42.4524769Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4524950Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4525036Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4525038Z 2025-09-07T07:34:42.4525245Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4525251Z 2025-09-07T07:34:42.4525252Z 2025-09-07T07:34:42.4525323Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4525517Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.4525520Z 2025-09-07T07:34:42.4525606Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4525679Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4525713Z inline_call [] 2025-09-07T07:34:42.4525770Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4525804Z inductor [] 2025-09-07T07:34:42.4525877Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4525968Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4526229Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4526341Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4526418Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4526624Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4526710Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4526842Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4526962Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4527036Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4528047Z inline_call [] 2025-09-07T07:34:42.4528103Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4528176Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4528269Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4528526Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4528635Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4528709Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4528859Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4528966Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4529095Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4529212Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4529262Z =================================== FAILURES =================================== 2025-09-07T07:34:42.4529386Z _ WhileLoopTests.test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4529430Z Traceback (most recent call last): 2025-09-07T07:34:42.4529577Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1267, in test_while_loop_with_outer_code 2025-09-07T07:34:42.4529612Z self._run_test( 2025-09-07T07:34:42.4529726Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4529783Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4529823Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4529955Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4530001Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4531006Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4531160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4531205Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4531244Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4531378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4531422Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4531482Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4531629Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4531710Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4531748Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4531901Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4531945Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4532097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4532149Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4532189Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4532330Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4532383Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4532421Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4532537Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4532602Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4532660Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4532788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4533813Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4533854Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4533995Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4534038Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4534120Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4534257Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4534297Z return aot_autograd( 2025-09-07T07:34:42.4534331Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4534467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4534535Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4534597Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4534760Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4534843Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4534887Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4535070Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4535112Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4535299Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4535339Z fx_g = _create_graph( 2025-09-07T07:34:42.4535374Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4535538Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4535572Z fx_g = make_fx( 2025-09-07T07:34:42.4536618Z ^^^^^^^^ 2025-09-07T07:34:42.4536772Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4536819Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4536886Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4537036Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4537077Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4537114Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4537273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4537311Z t = dispatch_trace( 2025-09-07T07:34:42.4537346Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4537459Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4537500Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4537536Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4537660Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4537703Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4537738Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4537900Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4537979Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4538020Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4538166Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4538204Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4538239Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4539333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4539375Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4539409Z ^^^^^^^^^ 2025-09-07T07:34:42.4539545Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4539612Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4539648Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4539796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4539845Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4539879Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4540057Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4540119Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4540164Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4540339Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4540382Z outs_pair = fn(*args) 2025-09-07T07:34:42.4540415Z ^^^^^^^^^ 2025-09-07T07:34:42.4540589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4540655Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4540698Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4540872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4540912Z outs_pair = fn(*args) 2025-09-07T07:34:42.4540945Z ^^^^^^^^^ 2025-09-07T07:34:42.4542081Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4542141Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4542211Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4542407Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4542477Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4542522Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4542697Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4542735Z outs_pair = fn(*args) 2025-09-07T07:34:42.4542770Z ^^^^^^^^^ 2025-09-07T07:34:42.4542963Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4543008Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4543046Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4543218Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4543264Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4543300Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4543425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4543483Z return handle_torch_function( 2025-09-07T07:34:42.4543523Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4543664Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4543738Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4543782Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4543948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4544968Z return func(*args, **kwargs) 2025-09-07T07:34:42.4545005Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4545129Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4545170Z result = _engine_run_backward( 2025-09-07T07:34:42.4545205Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4545370Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4545490Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4545539Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4545664Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4545711Z return user_fn(self, *args) 2025-09-07T07:34:42.4545746Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4545891Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4545933Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4545969Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4546127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4546173Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4546208Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4546331Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4546370Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4546404Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4546634Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4547678Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4547719Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4547857Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4547906Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4547944Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4548106Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4548153Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4548192Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4548349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4548391Z t = dispatch_trace( 2025-09-07T07:34:42.4548425Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4548537Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4548579Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4548616Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4548762Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4548802Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4548837Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4549002Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4549079Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4549119Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4549243Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4549306Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4550298Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4550426Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4550466Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4550500Z ^^^^^^^^^ 2025-09-07T07:34:42.4550672Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4550722Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4550755Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4550796Z File "", line 1, in 2025-09-07T07:34:42.4550939Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4551016Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4551064Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4551200Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4551247Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4551283Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4551478Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4551521Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4551556Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4551726Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4551770Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4551823Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4551973Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4552014Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4553015Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4553150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4553240Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4553286Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4553411Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4553470Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4553514Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4553642Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4553681Z leaves = list(leaves) 2025-09-07T07:34:42.4553715Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4553839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4553873Z return func(x) 2025-09-07T07:34:42.4553905Z ^^^^^^^ 2025-09-07T07:34:42.4554063Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4554129Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4554170Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4554338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4554379Z return func(*args, **kwargs) 2025-09-07T07:34:42.4554415Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4554614Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4554699Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4554702Z 2025-09-07T07:34:42.4555870Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4555873Z 2025-09-07T07:34:42.4555875Z 2025-09-07T07:34:42.4555966Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4556160Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.4556162Z 2025-09-07T07:34:42.4556248Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4556324Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4556361Z inline_call [] 2025-09-07T07:34:42.4556417Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4556451Z inductor [] 2025-09-07T07:34:42.4556577Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4556649Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4556911Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4557022Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4557098Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4557248Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4557363Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4557494Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4557612Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4557684Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4557718Z inline_call [] 2025-09-07T07:34:42.4557775Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4557847Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4557916Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4559143Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4559260Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4559336Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4559485Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4559601Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4559732Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4559850Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4559919Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4559954Z inline_call [] 2025-09-07T07:34:42.4560008Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4560100Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4560235Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4560491Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4560600Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4560695Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4560844Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4560928Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4561056Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4561176Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4561395Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-aa44cd38caeaebd1.xml - 2025-09-07T07:34:42.4561454Z =========================== short test summary info ============================ 2025-09-07T07:34:42.4562832Z FAILED [0.2232s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4562918Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4562920Z 2025-09-07T07:34:42.4563126Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4563152Z 2025-09-07T07:34:42.4563153Z 2025-09-07T07:34:42.4563225Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4563417Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.4563419Z 2025-09-07T07:34:42.4563505Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4563566Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.4563632Z ================== 1 failed, 245 deselected, 2 rerun in 1.17s ================== 2025-09-07T07:34:42.4563666Z Got exit code 1 2025-09-07T07:34:42.4563704Z Retrying single test... 2025-09-07T07:34:42.4564130Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.4564172Z import pkg_resources 2025-09-07T07:34:42.4564341Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-dd0453c0531b129e.xml 2025-09-07T07:34:42.4564412Z ============================= test session starts ============================== 2025-09-07T07:34:42.4564528Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.4564568Z cachedir: .pytest_cache 2025-09-07T07:34:42.4564723Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.4564767Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.4564805Z configfile: pytest.ini 2025-09-07T07:34:42.4564967Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.4565062Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.4566308Z stepcurrent: skipping 60 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.4566364Z Running 1 items in this shard 2025-09-07T07:34:42.4566366Z 2025-09-07T07:34:42.4566636Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.4066s] [100%] 2025-09-07T07:34:42.4566826Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.2143s] [100%] 2025-09-07T07:34:42.4566990Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True FAILED [0.2089s] [100%] 2025-09-07T07:34:42.4566998Z 2025-09-07T07:34:42.4567048Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.4567152Z _ WhileLoopTests.test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4567196Z Traceback (most recent call last): 2025-09-07T07:34:42.4567346Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1267, in test_while_loop_with_outer_code 2025-09-07T07:34:42.4567380Z self._run_test( 2025-09-07T07:34:42.4567494Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4567550Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4567591Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4567725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4567805Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4567845Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4568000Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4568046Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4568084Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4568221Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4568266Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4568302Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4569420Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4569501Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4569540Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4569695Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4569741Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4569890Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4569944Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4570006Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4570150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4570200Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4570239Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4570355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4570424Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4570490Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4570617Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4570681Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4570724Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4570882Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4570926Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4570963Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4571101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4572104Z return aot_autograd( 2025-09-07T07:34:42.4572139Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4572277Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4572348Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4572394Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4572553Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4572639Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4572684Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4572866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4572908Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4573094Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4573171Z fx_g = _create_graph( 2025-09-07T07:34:42.4573208Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4573370Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4573405Z fx_g = make_fx( 2025-09-07T07:34:42.4573437Z ^^^^^^^^ 2025-09-07T07:34:42.4573591Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4573636Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4573674Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4573819Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4573862Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4573897Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4575016Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4575054Z t = dispatch_trace( 2025-09-07T07:34:42.4575088Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4575201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4575243Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4575299Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4575426Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4575467Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4575501Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4575662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4575741Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4575805Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4575931Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4575971Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4576006Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4576134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4576175Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4576226Z ^^^^^^^^^ 2025-09-07T07:34:42.4576359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4576399Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4576434Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4577610Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4577664Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4577698Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4577856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4577919Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4577965Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4578144Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4578183Z outs_pair = fn(*args) 2025-09-07T07:34:42.4578217Z ^^^^^^^^^ 2025-09-07T07:34:42.4578389Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4578454Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4578536Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4578709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4578748Z outs_pair = fn(*args) 2025-09-07T07:34:42.4578782Z ^^^^^^^^^ 2025-09-07T07:34:42.4578961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4579021Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4579064Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4579258Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4579327Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4579375Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4579546Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4580562Z outs_pair = fn(*args) 2025-09-07T07:34:42.4580597Z ^^^^^^^^^ 2025-09-07T07:34:42.4580812Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4580859Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4580895Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4581065Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4581110Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4581147Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4581275Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4581339Z return handle_torch_function( 2025-09-07T07:34:42.4581374Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4581515Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4581590Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4581653Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4581819Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4581860Z return func(*args, **kwargs) 2025-09-07T07:34:42.4581895Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4582018Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4582061Z result = _engine_run_backward( 2025-09-07T07:34:42.4582096Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4582242Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4583331Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4583384Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4583512Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4583553Z return user_fn(self, *args) 2025-09-07T07:34:42.4583588Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4583733Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4583776Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4583836Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4583997Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4584041Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4584077Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4584201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4584240Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4584276Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4584441Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4584493Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4584532Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4584669Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4584724Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4584765Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4584925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4586158Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4586219Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4586382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4586420Z t = dispatch_trace( 2025-09-07T07:34:42.4586454Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4586630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4586673Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4586711Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4586857Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4586895Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4586930Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4587090Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4587169Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4587230Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4587354Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4587394Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4587427Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4587556Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4587599Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4587637Z ^^^^^^^^^ 2025-09-07T07:34:42.4587785Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4587834Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4588882Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4588926Z File "", line 1, in 2025-09-07T07:34:42.4589073Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4589151Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4589196Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4589333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4589379Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4589448Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4589641Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4589684Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4589720Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4589892Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4589936Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4589972Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4590116Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4590157Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4590193Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4590329Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4590418Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4590463Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4590588Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4590670Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4591692Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4591819Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4591858Z leaves = list(leaves) 2025-09-07T07:34:42.4591892Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4592014Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4592051Z return func(x) 2025-09-07T07:34:42.4592102Z ^^^^^^^ 2025-09-07T07:34:42.4592240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4592304Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4592345Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4592513Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4592568Z return func(*args, **kwargs) 2025-09-07T07:34:42.4592605Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4592785Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4592871Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4592874Z 2025-09-07T07:34:42.4593082Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4593085Z 2025-09-07T07:34:42.4593087Z 2025-09-07T07:34:42.4593160Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4593353Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.4593355Z 2025-09-07T07:34:42.4593442Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4593517Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4593551Z inline_call [] 2025-09-07T07:34:42.4594578Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4594613Z inductor [] 2025-09-07T07:34:42.4594686Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4594785Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4595043Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4595155Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4595231Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4595384Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4595468Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4595601Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4595720Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4595829Z _ WhileLoopTests.test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4595873Z Traceback (most recent call last): 2025-09-07T07:34:42.4596019Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1267, in test_while_loop_with_outer_code 2025-09-07T07:34:42.4596054Z self._run_test( 2025-09-07T07:34:42.4596183Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4596239Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4596279Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4596411Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4596456Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4596567Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4596722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4597799Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4597838Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4597974Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4598019Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4598056Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4598224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4598307Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4598345Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4598497Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4598547Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4598696Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4598748Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4598788Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4598932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4598983Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4599021Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4599138Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4599203Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4599247Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4599397Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4599460Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4600531Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4600675Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4600720Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4600758Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4600896Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4600935Z return aot_autograd( 2025-09-07T07:34:42.4600970Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4601105Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4601176Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4601224Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4601385Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4601467Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4601538Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4601722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4601766Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4601951Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4601992Z fx_g = _create_graph( 2025-09-07T07:34:42.4602027Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4602211Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4602245Z fx_g = make_fx( 2025-09-07T07:34:42.4602278Z ^^^^^^^^ 2025-09-07T07:34:42.4602429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4603456Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4603512Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4603662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4603704Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4603741Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4603900Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4603942Z t = dispatch_trace( 2025-09-07T07:34:42.4603976Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4604090Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4604130Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4604166Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4604292Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4604332Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4604367Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4604528Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4604608Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4604648Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4604794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4604832Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4604868Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4604993Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4606006Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4606041Z ^^^^^^^^^ 2025-09-07T07:34:42.4606177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4606217Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4606252Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4606400Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4606450Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4606552Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4606711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4606772Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4606816Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4607019Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4607061Z outs_pair = fn(*args) 2025-09-07T07:34:42.4607097Z ^^^^^^^^^ 2025-09-07T07:34:42.4607271Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4607336Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4607381Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4607557Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4607619Z outs_pair = fn(*args) 2025-09-07T07:34:42.4607653Z ^^^^^^^^^ 2025-09-07T07:34:42.4607830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4607892Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4608929Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4609153Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4609224Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4609269Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4609442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4609484Z outs_pair = fn(*args) 2025-09-07T07:34:42.4609517Z ^^^^^^^^^ 2025-09-07T07:34:42.4609709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4609753Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4609791Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4609961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4610008Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4610044Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4610169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4610250Z return handle_torch_function( 2025-09-07T07:34:42.4610288Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4610429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4610504Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4610548Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4610718Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4610758Z return func(*args, **kwargs) 2025-09-07T07:34:42.4611768Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4611892Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4611934Z result = _engine_run_backward( 2025-09-07T07:34:42.4611969Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4612118Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4612240Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4612289Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4612433Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4612476Z return user_fn(self, *args) 2025-09-07T07:34:42.4612512Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4612657Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4612699Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4612735Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4612893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4612955Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4612992Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4613117Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4613156Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4613191Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4613372Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4613424Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4613463Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4614573Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4614624Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4614663Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4614827Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4614874Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4614913Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4615072Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4615110Z t = dispatch_trace( 2025-09-07T07:34:42.4615145Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4615259Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4615300Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4615336Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4615461Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4615520Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4615554Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4615714Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4615792Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4615832Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4615959Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4615996Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4616031Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4616157Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4617244Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4617278Z ^^^^^^^^^ 2025-09-07T07:34:42.4617432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4617481Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4617515Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4617555Z File "", line 1, in 2025-09-07T07:34:42.4617700Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4617803Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4617850Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4617986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4618033Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4618071Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4618263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4618343Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4618378Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4618549Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4618595Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4618631Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4618798Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4618840Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4618875Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4619987Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4620076Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4620124Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4620253Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4620313Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4620356Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4620487Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4620524Z leaves = list(leaves) 2025-09-07T07:34:42.4620559Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4620681Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4620716Z return func(x) 2025-09-07T07:34:42.4620748Z ^^^^^^^ 2025-09-07T07:34:42.4620886Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4620977Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4621018Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4621184Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4621225Z return func(*args, **kwargs) 2025-09-07T07:34:42.4621261Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4621442Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4621527Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4621530Z 2025-09-07T07:34:42.4621736Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4621741Z 2025-09-07T07:34:42.4621743Z 2025-09-07T07:34:42.4621813Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4622980Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.4622983Z 2025-09-07T07:34:42.4623087Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4623165Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4623199Z inline_call [] 2025-09-07T07:34:42.4623256Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4623290Z inductor [] 2025-09-07T07:34:42.4623363Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4623435Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4623696Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4623824Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4623900Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4624052Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4624152Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4624283Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4624403Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4624473Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4624510Z inline_call [] 2025-09-07T07:34:42.4624565Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4624637Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4624705Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4624960Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4626040Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4626116Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4626265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4626349Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4626555Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4626677Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4626725Z =================================== FAILURES =================================== 2025-09-07T07:34:42.4626831Z _ WhileLoopTests.test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4626875Z Traceback (most recent call last): 2025-09-07T07:34:42.4627022Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1267, in test_while_loop_with_outer_code 2025-09-07T07:34:42.4627056Z self._run_test( 2025-09-07T07:34:42.4627169Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4627224Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4627267Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4627400Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4627445Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4627485Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4627669Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4627717Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4627756Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4627892Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4627935Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4628957Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4629099Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4629208Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4629245Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4629398Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4629443Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4629611Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4629665Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4629706Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4629847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4629898Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4629939Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4630058Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4630124Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4630169Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4630297Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4630359Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4630401Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4630541Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4630584Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4630621Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4631729Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4631799Z return aot_autograd( 2025-09-07T07:34:42.4631834Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4631970Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4632038Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4632084Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4632247Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4632329Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4632375Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4632557Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4632604Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4632788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4632828Z fx_g = _create_graph( 2025-09-07T07:34:42.4632862Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4633042Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4633078Z fx_g = make_fx( 2025-09-07T07:34:42.4633111Z ^^^^^^^^ 2025-09-07T07:34:42.4633262Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4633307Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4633344Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4633490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4633547Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4634552Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4634711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4634749Z t = dispatch_trace( 2025-09-07T07:34:42.4634783Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4634915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4634956Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4634991Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4635115Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4635154Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4635189Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4635351Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4635433Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4635474Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4635599Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4635638Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4635673Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4635799Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4635840Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4635873Z ^^^^^^^^^ 2025-09-07T07:34:42.4636005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4636066Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4637123Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4637273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4637323Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4637356Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4637514Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4637577Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4637622Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4637796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4637835Z outs_pair = fn(*args) 2025-09-07T07:34:42.4637868Z ^^^^^^^^^ 2025-09-07T07:34:42.4638043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4638109Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4638153Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4638357Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4638396Z outs_pair = fn(*args) 2025-09-07T07:34:42.4638431Z ^^^^^^^^^ 2025-09-07T07:34:42.4638610Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4638669Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4638711Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4638906Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4639001Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4639046Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4640248Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4640289Z outs_pair = fn(*args) 2025-09-07T07:34:42.4640322Z ^^^^^^^^^ 2025-09-07T07:34:42.4640540Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4640585Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4640622Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4640790Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4640840Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4640876Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4641003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4641044Z return handle_torch_function( 2025-09-07T07:34:42.4641081Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4641225Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4641300Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4641344Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4641512Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4641552Z return func(*args, **kwargs) 2025-09-07T07:34:42.4641615Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4641739Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4641781Z result = _engine_run_backward( 2025-09-07T07:34:42.4641815Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4642932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4643056Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4643105Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4643231Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4643272Z return user_fn(self, *args) 2025-09-07T07:34:42.4643307Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4643454Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4643497Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4643534Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4643690Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4643754Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4643790Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4643914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4643954Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4643989Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4644154Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4644207Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4644264Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4644399Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4644449Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4644486Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4644650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4645679Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4645720Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4645879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4645917Z t = dispatch_trace( 2025-09-07T07:34:42.4645950Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4646064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4646109Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4646145Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4646267Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4646305Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4646340Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4646598Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4646675Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4646716Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4646838Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4646876Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4646938Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4647065Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4647105Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4647139Z ^^^^^^^^^ 2025-09-07T07:34:42.4647288Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4647338Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4648354Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4648396Z File "", line 1, in 2025-09-07T07:34:42.4648548Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4648624Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4648669Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4648808Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4648855Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4648893Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4649085Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4649150Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4649190Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4649361Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4649405Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4649441Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4649585Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4649653Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4649689Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4649821Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4649909Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4649955Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4650097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4651123Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4651168Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4651293Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4651331Z leaves = list(leaves) 2025-09-07T07:34:42.4651369Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4651490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4651525Z return func(x) 2025-09-07T07:34:42.4651557Z ^^^^^^^ 2025-09-07T07:34:42.4651696Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4651762Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4651805Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4651970Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4652011Z return func(*args, **kwargs) 2025-09-07T07:34:42.4652046Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4652226Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4652334Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4652336Z 2025-09-07T07:34:42.4652542Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4652545Z 2025-09-07T07:34:42.4652546Z 2025-09-07T07:34:42.4652618Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4652813Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.4652815Z 2025-09-07T07:34:42.4652901Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4652974Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4653008Z inline_call [] 2025-09-07T07:34:42.4654034Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4654068Z inductor [] 2025-09-07T07:34:42.4654142Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4654213Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4654489Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4654601Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4654678Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4654829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4654914Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4655063Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4655184Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4655254Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4655289Z inline_call [] 2025-09-07T07:34:42.4655344Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4655432Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4655502Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4655755Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4655863Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4655938Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4656087Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4656172Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4657374Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4657496Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4657564Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4657599Z inline_call [] 2025-09-07T07:34:42.4657654Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4657725Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4657831Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4658089Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4658196Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4658270Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4658419Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4658502Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4658630Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4658747Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4658963Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-dd0453c0531b129e.xml - 2025-09-07T07:34:42.4659021Z =========================== short test summary info ============================ 2025-09-07T07:34:42.4659400Z FAILED [0.2089s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4659484Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4659487Z 2025-09-07T07:34:42.4659691Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4659695Z 2025-09-07T07:34:42.4659696Z 2025-09-07T07:34:42.4659791Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4659980Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.4659983Z 2025-09-07T07:34:42.4661053Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4661115Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.4661200Z ================== 1 failed, 245 deselected, 2 rerun in 1.00s ================== 2025-09-07T07:34:42.4661236Z Got exit code 1 2025-09-07T07:34:42.4661360Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-09-07T07:34:42.4661781Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.4661825Z import pkg_resources 2025-09-07T07:34:42.4661993Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-23c3cb5017c3f022.xml 2025-09-07T07:34:42.4662049Z ============================= test session starts ============================== 2025-09-07T07:34:42.4662163Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.4662203Z cachedir: .pytest_cache 2025-09-07T07:34:42.4662358Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.4662402Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.4662440Z configfile: pytest.ini 2025-09-07T07:34:42.4662601Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.4662700Z collecting ... collected 467 items / 61 deselected / 406 selected 2025-09-07T07:34:42.4662751Z stepcurrent: skipping 61 already run items. 2025-09-07T07:34:42.4662792Z Running 185 items in this shard 2025-09-07T07:34:42.4662794Z 2025-09-07T07:34:42.4662969Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_False PASSED [1.9094s] [ 0%] 2025-09-07T07:34:42.4663164Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.2525s] [ 1%] 2025-09-07T07:34:42.4663357Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.4153s] [ 1%] 2025-09-07T07:34:42.4663524Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True FAILED [0.2094s] [ 1%] 2025-09-07T07:34:42.4663529Z 2025-09-07T07:34:42.4664564Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.4664669Z _ WhileLoopTests.test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4664712Z Traceback (most recent call last): 2025-09-07T07:34:42.4664876Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1267, in test_while_loop_with_outer_code 2025-09-07T07:34:42.4664913Z self._run_test( 2025-09-07T07:34:42.4665027Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4665083Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4665122Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4665256Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4665305Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4665361Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4665511Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4665557Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4665595Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4665732Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4665792Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4665830Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4665972Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4666053Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4666091Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4666244Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4667529Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4667685Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4667738Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4667780Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4667923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4667973Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4668011Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4668125Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4668192Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4668273Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4668400Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4668462Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4668504Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4668643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4668689Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4668726Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4668863Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4668902Z return aot_autograd( 2025-09-07T07:34:42.4668938Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4669076Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4669146Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4669191Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4670340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4670450Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4670497Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4670683Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4670725Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4670910Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4670973Z fx_g = _create_graph( 2025-09-07T07:34:42.4671007Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4671170Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4671203Z fx_g = make_fx( 2025-09-07T07:34:42.4671235Z ^^^^^^^^ 2025-09-07T07:34:42.4671387Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4671454Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4671491Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4671638Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4671681Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4671717Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4671876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4671916Z t = dispatch_trace( 2025-09-07T07:34:42.4671949Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4672061Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4673067Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4673106Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4673233Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4673272Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4673308Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4673468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4673547Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4673609Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4673737Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4673774Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4673809Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4673937Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4673978Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4674013Z ^^^^^^^^^ 2025-09-07T07:34:42.4674145Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4674184Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4674218Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4674367Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4674418Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4674452Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4674608Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4674670Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4675677Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4675871Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4675911Z outs_pair = fn(*args) 2025-09-07T07:34:42.4675947Z ^^^^^^^^^ 2025-09-07T07:34:42.4676119Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4676186Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4676267Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4676440Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4676478Z outs_pair = fn(*args) 2025-09-07T07:34:42.4676587Z ^^^^^^^^^ 2025-09-07T07:34:42.4676766Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4676857Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4676899Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4677095Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4677164Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4677211Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4677385Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4677423Z outs_pair = fn(*args) 2025-09-07T07:34:42.4677456Z ^^^^^^^^^ 2025-09-07T07:34:42.4677649Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4677695Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4678714Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4678885Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4678931Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4678967Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4679123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4679168Z return handle_torch_function( 2025-09-07T07:34:42.4679203Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4679345Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4679419Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4679465Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4679634Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4679674Z return func(*args, **kwargs) 2025-09-07T07:34:42.4679708Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4679832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4679876Z result = _engine_run_backward( 2025-09-07T07:34:42.4679913Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4680060Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4680242Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4680317Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4680445Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4680486Z return user_fn(self, *args) 2025-09-07T07:34:42.4680522Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4681651Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4681696Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4681736Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4681919Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4681962Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4681998Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4682120Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4682161Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4682195Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4682377Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4682429Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4682469Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4682605Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4682658Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4682696Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4682858Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4682905Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4682945Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4683106Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4683144Z t = dispatch_trace( 2025-09-07T07:34:42.4683178Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4683291Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4684307Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4684342Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4684495Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4684534Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4684569Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4684729Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4684808Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4684848Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4684974Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4685011Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4685046Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4685171Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4685216Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4685250Z ^^^^^^^^^ 2025-09-07T07:34:42.4685399Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4685447Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4685480Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4685521Z File "", line 1, in 2025-09-07T07:34:42.4685678Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4685759Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4685804Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4687000Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4687052Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4687091Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4687317Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4687360Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4687394Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4687565Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4687629Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4687667Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4687811Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4687853Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4687888Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4688022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4688110Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4688156Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4688280Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4688342Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4688386Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4688512Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4688550Z leaves = list(leaves) 2025-09-07T07:34:42.4688585Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4688707Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4689742Z return func(x) 2025-09-07T07:34:42.4689778Z ^^^^^^^ 2025-09-07T07:34:42.4689916Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4689981Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4690023Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4690191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4690234Z return func(*args, **kwargs) 2025-09-07T07:34:42.4690268Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4690449Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4690534Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4690536Z 2025-09-07T07:34:42.4690743Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4690749Z 2025-09-07T07:34:42.4690751Z 2025-09-07T07:34:42.4690823Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4691016Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.4691038Z 2025-09-07T07:34:42.4691126Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4691201Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4691235Z inline_call [] 2025-09-07T07:34:42.4691292Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4691365Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4691436Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4691713Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4691826Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4692874Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4693046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4693132Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4693263Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4693382Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4693489Z _ WhileLoopTests.test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4693535Z Traceback (most recent call last): 2025-09-07T07:34:42.4693681Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1267, in test_while_loop_with_outer_code 2025-09-07T07:34:42.4693716Z self._run_test( 2025-09-07T07:34:42.4693830Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4693884Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4693925Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4694056Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4694101Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4694140Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4694290Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4694356Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4694394Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4694529Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4694572Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4694610Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4694754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4695797Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4695836Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4695990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4696037Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4696189Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4696241Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4696280Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4696441Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4696606Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4696645Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4696761Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4696826Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4696871Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4696996Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4697092Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4697133Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4697273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4697316Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4697353Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4697509Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4697549Z return aot_autograd( 2025-09-07T07:34:42.4698578Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4698715Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4698784Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4698832Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4698993Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4699076Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4699121Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4699306Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4699349Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4699534Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4699573Z fx_g = _create_graph( 2025-09-07T07:34:42.4699608Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4699796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4699830Z fx_g = make_fx( 2025-09-07T07:34:42.4699863Z ^^^^^^^^ 2025-09-07T07:34:42.4700014Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4700060Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4700097Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4700244Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4700285Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4700321Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4700479Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4701487Z t = dispatch_trace( 2025-09-07T07:34:42.4701522Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4701636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4701676Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4701712Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4701836Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4701897Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4701932Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4702096Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4702175Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4702215Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4702338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4702394Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4702429Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4702555Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4702596Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4702630Z ^^^^^^^^^ 2025-09-07T07:34:42.4702763Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4702817Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4702853Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4703001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4704015Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4704049Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4704205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4704270Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4704314Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4704488Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4704529Z outs_pair = fn(*args) 2025-09-07T07:34:42.4704562Z ^^^^^^^^^ 2025-09-07T07:34:42.4704735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4704800Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4704845Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4705017Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4705076Z outs_pair = fn(*args) 2025-09-07T07:34:42.4705109Z ^^^^^^^^^ 2025-09-07T07:34:42.4705286Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4705346Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4705388Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4705583Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4705653Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4705699Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4705869Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4705911Z outs_pair = fn(*args) 2025-09-07T07:34:42.4706980Z ^^^^^^^^^ 2025-09-07T07:34:42.4707173Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4707217Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4707254Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4707449Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4707497Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4707532Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4707656Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4707697Z return handle_torch_function( 2025-09-07T07:34:42.4707736Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4707900Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4707974Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4708017Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4708184Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4708244Z return func(*args, **kwargs) 2025-09-07T07:34:42.4708280Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4708404Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4708447Z result = _engine_run_backward( 2025-09-07T07:34:42.4708481Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4708627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4708748Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4709772Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4709900Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4709943Z return user_fn(self, *args) 2025-09-07T07:34:42.4709978Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4710124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4710166Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4710202Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4710363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4710434Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4710471Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4710593Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4710633Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4710668Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4710834Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4710886Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4710926Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4711062Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4711110Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4711147Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4711310Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4711356Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4711396Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4712516Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4712574Z t = dispatch_trace( 2025-09-07T07:34:42.4712608Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4712722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4712764Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4712800Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4712923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4712962Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4713016Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4713175Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4713253Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4713293Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4713417Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4713473Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4713509Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4713634Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4713675Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4713708Z ^^^^^^^^^ 2025-09-07T07:34:42.4713856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4713907Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4713941Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4713981Z File "", line 1, in 2025-09-07T07:34:42.4715096Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4715175Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4715222Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4715358Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4715405Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4715442Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4715634Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4715703Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4715745Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4715914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4715958Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4715995Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4716139Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4716179Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4716214Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4716349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4716436Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4716550Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4716674Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4716734Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4716777Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4717919Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4717959Z leaves = list(leaves) 2025-09-07T07:34:42.4717994Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4718117Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4718152Z return func(x) 2025-09-07T07:34:42.4718184Z ^^^^^^^ 2025-09-07T07:34:42.4718321Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4718408Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4718449Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4718616Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4718657Z return func(*args, **kwargs) 2025-09-07T07:34:42.4718692Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4718896Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4718981Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4718983Z 2025-09-07T07:34:42.4719188Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4719191Z 2025-09-07T07:34:42.4719193Z 2025-09-07T07:34:42.4719266Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4719461Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.4719463Z 2025-09-07T07:34:42.4719547Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4719622Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4719656Z inline_call [] 2025-09-07T07:34:42.4719714Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4719786Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4720921Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4721180Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4721329Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4721406Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4721558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4721645Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4721778Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4721896Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4721968Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4722002Z inline_call [] 2025-09-07T07:34:42.4722061Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4722133Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4722203Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4722454Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4722585Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4722660Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4722811Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4722893Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4723021Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4723158Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4723208Z =================================== FAILURES =================================== 2025-09-07T07:34:42.4724297Z _ WhileLoopTests.test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4724344Z Traceback (most recent call last): 2025-09-07T07:34:42.4724510Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1267, in test_while_loop_with_outer_code 2025-09-07T07:34:42.4724546Z self._run_test( 2025-09-07T07:34:42.4724658Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4724715Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4724754Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4724887Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4724937Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4724977Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4725126Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4725171Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4725213Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4725349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4725392Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4725428Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4725572Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4725652Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4725713Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4725863Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4725910Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4726060Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4727181Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4727224Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4727365Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4727414Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4727453Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4727567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4727637Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4727680Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4727809Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4727873Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4727949Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4728089Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4728133Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4728169Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4728307Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4728348Z return aot_autograd( 2025-09-07T07:34:42.4728404Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4728539Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4728606Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4728652Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4728811Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4729898Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4729944Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4730127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4730170Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4730357Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4730396Z fx_g = _create_graph( 2025-09-07T07:34:42.4730431Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4730593Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4730629Z fx_g = make_fx( 2025-09-07T07:34:42.4730661Z ^^^^^^^^ 2025-09-07T07:34:42.4730813Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4730858Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4730896Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4731041Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4731115Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4731152Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4731310Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4731347Z t = dispatch_trace( 2025-09-07T07:34:42.4731381Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4731494Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4731535Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4731571Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4732671Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4732711Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4732746Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4732908Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4732990Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4733029Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4733154Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4733192Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4733225Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4733373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4733414Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4733449Z ^^^^^^^^^ 2025-09-07T07:34:42.4733580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4733621Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4733655Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4733805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4733869Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4733903Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4734059Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4734122Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4734186Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4735335Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4735374Z outs_pair = fn(*args) 2025-09-07T07:34:42.4735408Z ^^^^^^^^^ 2025-09-07T07:34:42.4735579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4735649Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4735692Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4735865Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4735902Z outs_pair = fn(*args) 2025-09-07T07:34:42.4735937Z ^^^^^^^^^ 2025-09-07T07:34:42.4736114Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4736173Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4736214Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4736408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4736581Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4736626Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4736798Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4736835Z outs_pair = fn(*args) 2025-09-07T07:34:42.4736869Z ^^^^^^^^^ 2025-09-07T07:34:42.4737062Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4737108Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4737143Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4737310Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4738342Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4738384Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4738509Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4738551Z return handle_torch_function( 2025-09-07T07:34:42.4738586Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4738752Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4738828Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4738873Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4739038Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4739079Z return func(*args, **kwargs) 2025-09-07T07:34:42.4739114Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4739240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4739301Z result = _engine_run_backward( 2025-09-07T07:34:42.4739337Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4739482Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4739603Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4739671Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4739799Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4739840Z return user_fn(self, *args) 2025-09-07T07:34:42.4739876Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4740020Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4741036Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4741073Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4741232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4741275Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4741311Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4741437Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4741477Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4741511Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4741675Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4741726Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4741765Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4741935Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4741984Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4742022Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4742182Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4742230Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4742270Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4742431Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4742468Z t = dispatch_trace( 2025-09-07T07:34:42.4742502Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4742614Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4742659Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4743662Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4743788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4743826Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4743861Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4744038Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4744117Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4744157Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4744281Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4744318Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4744352Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4744480Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4744537Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4744571Z ^^^^^^^^^ 2025-09-07T07:34:42.4744719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4744768Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4744802Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4744844Z File "", line 1, in 2025-09-07T07:34:42.4745113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4745191Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4745236Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4745372Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4745422Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4746434Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4746700Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4746744Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4746780Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4746952Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4746996Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4747032Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4747174Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4747216Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4747283Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4747417Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4747503Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4747550Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4747677Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4747738Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4747781Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4747907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4747944Z leaves = list(leaves) 2025-09-07T07:34:42.4747977Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4748102Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4748136Z return func(x) 2025-09-07T07:34:42.4749151Z ^^^^^^^ 2025-09-07T07:34:42.4749289Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4749353Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4749418Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4749588Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4749628Z return func(*args, **kwargs) 2025-09-07T07:34:42.4749664Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4749844Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4749929Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4749959Z 2025-09-07T07:34:42.4750166Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4750168Z 2025-09-07T07:34:42.4750170Z 2025-09-07T07:34:42.4750243Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4750454Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.4750458Z 2025-09-07T07:34:42.4750543Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4750615Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4750650Z inline_call [] 2025-09-07T07:34:42.4750706Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4750782Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4750854Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4751111Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4751223Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4751300Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4751449Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4752676Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4752806Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4752948Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4753020Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4753055Z inline_call [] 2025-09-07T07:34:42.4753110Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4753183Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4753253Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4753510Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4753619Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4753693Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4753844Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4753930Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4754058Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4754191Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4754261Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4754296Z inline_call [] 2025-09-07T07:34:42.4754350Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4754421Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4754489Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4754741Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4755836Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4755911Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4756060Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4756165Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4756293Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4756410Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4756715Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-23c3cb5017c3f022.xml - 2025-09-07T07:34:42.4756778Z =========================== short test summary info ============================ 2025-09-07T07:34:42.4757138Z FAILED [0.2094s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4757223Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4757226Z 2025-09-07T07:34:42.4757431Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4757434Z 2025-09-07T07:34:42.4757436Z 2025-09-07T07:34:42.4757507Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4757723Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.4757726Z 2025-09-07T07:34:42.4757810Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4757869Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.4757940Z ============= 1 failed, 1 passed, 61 deselected, 2 rerun in 2.99s ============== 2025-09-07T07:34:42.4757975Z Got exit code 1 2025-09-07T07:34:42.4758014Z Retrying single test... 2025-09-07T07:34:42.4758438Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.4758476Z import pkg_resources 2025-09-07T07:34:42.4758647Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-fcc9b8c4aa523be3.xml 2025-09-07T07:34:42.4759694Z ============================= test session starts ============================== 2025-09-07T07:34:42.4759809Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.4759848Z cachedir: .pytest_cache 2025-09-07T07:34:42.4760026Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.4760072Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.4760110Z configfile: pytest.ini 2025-09-07T07:34:42.4760324Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.4760401Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.4760632Z stepcurrent: skipping 62 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.4760696Z Running 1 items in this shard 2025-09-07T07:34:42.4760698Z 2025-09-07T07:34:42.4760895Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.6028s] [100%] 2025-09-07T07:34:42.4761106Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.2226s] [100%] 2025-09-07T07:34:42.4761273Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True FAILED [0.2209s] [100%] 2025-09-07T07:34:42.4761276Z 2025-09-07T07:34:42.4761324Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.4761431Z _ WhileLoopTests.test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4761476Z Traceback (most recent call last): 2025-09-07T07:34:42.4761626Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1267, in test_while_loop_with_outer_code 2025-09-07T07:34:42.4761660Z self._run_test( 2025-09-07T07:34:42.4761774Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4761831Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4761872Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4762006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4763040Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4763079Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4763232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4763303Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4763342Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4763477Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4763521Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4763557Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4763702Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4763784Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4763823Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4763975Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4764020Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4764171Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4764228Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4764268Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4764410Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4764477Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4764515Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4764633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4764698Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4765713Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4765840Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4765925Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4765966Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4766109Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4766153Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4766190Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4766600Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4766641Z return aot_autograd( 2025-09-07T07:34:42.4766676Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4766812Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4766880Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4766926Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4767089Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4767172Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4767216Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4767400Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4767444Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4767630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4767669Z fx_g = _create_graph( 2025-09-07T07:34:42.4767704Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4767866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4768918Z fx_g = make_fx( 2025-09-07T07:34:42.4768952Z ^^^^^^^^ 2025-09-07T07:34:42.4769105Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4769151Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4769188Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4769337Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4769381Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4769417Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4769575Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4769613Z t = dispatch_trace( 2025-09-07T07:34:42.4769646Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4769764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4769804Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4769840Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4769964Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4770005Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4770060Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4770223Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4770301Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4770341Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4770464Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4771470Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4771526Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4771653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4771694Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4771729Z ^^^^^^^^^ 2025-09-07T07:34:42.4771863Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4771903Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4771955Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4772105Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4772154Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4772187Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4772342Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4772407Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4772450Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4772624Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4772664Z outs_pair = fn(*args) 2025-09-07T07:34:42.4772700Z ^^^^^^^^^ 2025-09-07T07:34:42.4772873Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4772939Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4772984Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4773155Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4773215Z outs_pair = fn(*args) 2025-09-07T07:34:42.4774223Z ^^^^^^^^^ 2025-09-07T07:34:42.4774402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4774460Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4774503Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4774698Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4774769Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4774813Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4774986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4775026Z outs_pair = fn(*args) 2025-09-07T07:34:42.4775063Z ^^^^^^^^^ 2025-09-07T07:34:42.4775252Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4775297Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4775332Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4775516Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4775564Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4775601Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4775727Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4775770Z return handle_torch_function( 2025-09-07T07:34:42.4775805Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4775949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4776048Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4777135Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4777303Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4777346Z return func(*args, **kwargs) 2025-09-07T07:34:42.4777410Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4777535Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4777577Z result = _engine_run_backward( 2025-09-07T07:34:42.4777612Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4777758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4777880Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4777931Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4778057Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4778099Z return user_fn(self, *args) 2025-09-07T07:34:42.4778135Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4778280Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4778323Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4778360Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4778518Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4778561Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4778618Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4778741Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4778780Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4778816Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4779954Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4780007Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4780048Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4780187Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4780235Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4780274Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4780433Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4780485Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4780523Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4780680Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4780718Z t = dispatch_trace( 2025-09-07T07:34:42.4780773Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4780889Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4780931Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4780967Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4781090Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4781129Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4781163Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4781344Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4781422Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4781463Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4781586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4782590Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4782625Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4782766Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4782807Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4782841Z ^^^^^^^^^ 2025-09-07T07:34:42.4782991Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4783041Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4783076Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4783118Z File "", line 1, in 2025-09-07T07:34:42.4783263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4783340Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4783385Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4783524Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4783570Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4783608Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4783798Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4783860Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4783897Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4784067Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4784111Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4784147Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4785259Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4785304Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4785339Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4785473Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4785560Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4785605Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4785733Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4785792Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4785835Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4785960Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4786014Z leaves = list(leaves) 2025-09-07T07:34:42.4786048Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4786174Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4786208Z return func(x) 2025-09-07T07:34:42.4786241Z ^^^^^^^ 2025-09-07T07:34:42.4786377Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4786442Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4786562Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4786730Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4786770Z return func(*args, **kwargs) 2025-09-07T07:34:42.4786805Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4786986Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4788077Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4788080Z 2025-09-07T07:34:42.4788287Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4788289Z 2025-09-07T07:34:42.4788291Z 2025-09-07T07:34:42.4788365Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4788560Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.4788563Z 2025-09-07T07:34:42.4788650Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4788723Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4788760Z inline_call [] 2025-09-07T07:34:42.4788815Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4788850Z inductor [] 2025-09-07T07:34:42.4788924Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4788995Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4789253Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4789385Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4789461Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4789613Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4789698Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4789830Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4789948Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4790053Z _ WhileLoopTests.test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4790096Z Traceback (most recent call last): 2025-09-07T07:34:42.4790243Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1267, in test_while_loop_with_outer_code 2025-09-07T07:34:42.4791247Z self._run_test( 2025-09-07T07:34:42.4791362Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4791416Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4791456Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4791606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4791655Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4791695Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4791845Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4791890Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4791928Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4792084Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4792127Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4792165Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4792305Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4792387Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4792438Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4792592Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4792637Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4792785Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4792838Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4792879Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4793020Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4794037Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4794076Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4794194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4794260Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4794304Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4794429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4794492Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4794534Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4794697Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4794740Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4794777Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4794913Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4794952Z return aot_autograd( 2025-09-07T07:34:42.4794988Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4795124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4795193Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4795238Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4795398Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4795483Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4795529Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4795710Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4796812Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4797002Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4797042Z fx_g = _create_graph( 2025-09-07T07:34:42.4797077Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4797240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4797274Z fx_g = make_fx( 2025-09-07T07:34:42.4797309Z ^^^^^^^^ 2025-09-07T07:34:42.4797479Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4797525Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4797562Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4797707Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4797750Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4797803Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4797961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4797998Z t = dispatch_trace( 2025-09-07T07:34:42.4798032Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4798145Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4798188Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4798224Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4798349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4798388Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4798423Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4799560Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4799640Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4799680Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4799805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4799842Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4799877Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4800002Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4800069Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4800103Z ^^^^^^^^^ 2025-09-07T07:34:42.4800276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4800316Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4800351Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4800500Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4800550Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4800583Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4800740Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4800801Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4800848Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4801023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4801062Z outs_pair = fn(*args) 2025-09-07T07:34:42.4801096Z ^^^^^^^^^ 2025-09-07T07:34:42.4802257Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4802326Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4802370Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4802541Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4802580Z outs_pair = fn(*args) 2025-09-07T07:34:42.4802613Z ^^^^^^^^^ 2025-09-07T07:34:42.4802792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4802866Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4802907Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4803103Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4803172Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4803231Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4803403Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4803442Z outs_pair = fn(*args) 2025-09-07T07:34:42.4803474Z ^^^^^^^^^ 2025-09-07T07:34:42.4803663Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4803709Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4803747Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4803916Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4803963Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4803999Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4804125Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4805130Z return handle_torch_function( 2025-09-07T07:34:42.4805168Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4805308Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4805402Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4805449Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4805615Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4805655Z return func(*args, **kwargs) 2025-09-07T07:34:42.4805691Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4805814Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4805858Z result = _engine_run_backward( 2025-09-07T07:34:42.4805892Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4806037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4806157Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4806208Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4806333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4806374Z return user_fn(self, *args) 2025-09-07T07:34:42.4806410Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4806638Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4806682Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4806718Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4806878Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4807900Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4807937Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4808060Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4808125Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4808160Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4808326Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4808377Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4808418Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4808571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4808621Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4808659Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4808820Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4808866Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4808910Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4809069Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4809107Z t = dispatch_trace( 2025-09-07T07:34:42.4809140Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4809253Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4809295Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4809332Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4809454Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4809493Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4810492Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4810653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4810760Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4810801Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4810925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4810962Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4810996Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4811122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4811165Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4811199Z ^^^^^^^^^ 2025-09-07T07:34:42.4811348Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4811396Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4811430Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4811473Z File "", line 1, in 2025-09-07T07:34:42.4811617Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4811694Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4811739Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4811888Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4811939Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4811976Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4812166Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4812208Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4813207Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4813382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4813443Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4813480Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4813623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4813665Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4813701Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4813848Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4813937Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4813981Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4814107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4814168Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4814211Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4814339Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4814376Z leaves = list(leaves) 2025-09-07T07:34:42.4814411Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4814537Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4814573Z return func(x) 2025-09-07T07:34:42.4814605Z ^^^^^^^ 2025-09-07T07:34:42.4814742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4814806Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4815808Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4815999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4816042Z return func(*args, **kwargs) 2025-09-07T07:34:42.4816077Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4816257Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4816343Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4816345Z 2025-09-07T07:34:42.4816620Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4816623Z 2025-09-07T07:34:42.4816625Z 2025-09-07T07:34:42.4816697Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4816893Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.4816899Z 2025-09-07T07:34:42.4816985Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4817059Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4817093Z inline_call [] 2025-09-07T07:34:42.4817150Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4817183Z inductor [] 2025-09-07T07:34:42.4817278Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4817350Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4817608Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4817718Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4817835Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4817984Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4818070Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4818201Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4819332Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4819404Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4819439Z inline_call [] 2025-09-07T07:34:42.4819496Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4819568Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4819640Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4819898Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4820005Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4820081Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4820232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4820318Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4820446Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4820564Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4820637Z =================================== FAILURES =================================== 2025-09-07T07:34:42.4820743Z _ WhileLoopTests.test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4820786Z Traceback (most recent call last): 2025-09-07T07:34:42.4820932Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1267, in test_while_loop_with_outer_code 2025-09-07T07:34:42.4820967Z self._run_test( 2025-09-07T07:34:42.4821080Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4821134Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4821174Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4822271Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4822319Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4822360Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4822511Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4822556Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4822595Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4822747Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4822791Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4822830Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4822971Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4823052Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4823089Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4823240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4823306Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4823456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4823510Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4823551Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4823710Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4823762Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4823800Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4823915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4823980Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4824989Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4825118Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4825181Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4825222Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4825363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4825406Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4825444Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4825579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4825618Z return aot_autograd( 2025-09-07T07:34:42.4825652Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4825788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4825877Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4825922Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4826084Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4826167Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4826213Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4826395Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4826438Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4826686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4826729Z fx_g = _create_graph( 2025-09-07T07:34:42.4826763Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4827897Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4827932Z fx_g = make_fx( 2025-09-07T07:34:42.4827965Z ^^^^^^^^ 2025-09-07T07:34:42.4828143Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4828192Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4828230Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4828376Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4828418Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4828454Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4828611Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4829003Z t = dispatch_trace( 2025-09-07T07:34:42.4829036Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4829150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4829190Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4829225Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4829371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4829411Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4829445Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4829607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4829685Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4829726Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4829853Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4830877Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4830912Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4831038Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4831081Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4831115Z ^^^^^^^^^ 2025-09-07T07:34:42.4831248Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4831287Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4831323Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4831470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4831545Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4831580Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4831737Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4831797Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4831841Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4832016Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4832056Z outs_pair = fn(*args) 2025-09-07T07:34:42.4832090Z ^^^^^^^^^ 2025-09-07T07:34:42.4832263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4832328Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4832372Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4832550Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4833552Z outs_pair = fn(*args) 2025-09-07T07:34:42.4833587Z ^^^^^^^^^ 2025-09-07T07:34:42.4833783Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4833842Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4833885Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4834077Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4834148Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4834192Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4834383Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4834421Z outs_pair = fn(*args) 2025-09-07T07:34:42.4834454Z ^^^^^^^^^ 2025-09-07T07:34:42.4834645Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4834690Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4834749Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4834917Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4834964Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4834999Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4835125Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4835168Z return handle_torch_function( 2025-09-07T07:34:42.4835203Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4835343Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4835418Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4836452Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4836692Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4836734Z return func(*args, **kwargs) 2025-09-07T07:34:42.4836769Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4836892Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4836934Z result = _engine_run_backward( 2025-09-07T07:34:42.4836996Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4837145Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4837264Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4837314Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4837440Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4837482Z return user_fn(self, *args) 2025-09-07T07:34:42.4837517Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4837660Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4837702Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4837739Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4837898Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4837942Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4837978Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4838101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4838140Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4838196Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4839346Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4839397Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4839437Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4839574Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4839626Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4839689Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4839852Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4839898Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4839937Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4840113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4840238Z t = dispatch_trace( 2025-09-07T07:34:42.4840273Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4840387Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4840428Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4840464Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4840587Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4840629Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4840662Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4840823Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4840899Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4840939Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4842036Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4842076Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4842109Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4842236Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4842276Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4842346Z ^^^^^^^^^ 2025-09-07T07:34:42.4842497Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4842546Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4842580Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4842621Z File "", line 1, in 2025-09-07T07:34:42.4842768Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4842846Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4842891Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4843026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4843073Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4843110Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4843305Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4843347Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4843383Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4843569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4843614Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4843651Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4844768Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4844811Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4844847Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4844979Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4845094Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4845139Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4845264Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4845322Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4845366Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4845505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4845544Z leaves = list(leaves) 2025-09-07T07:34:42.4845579Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4845703Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4845738Z return func(x) 2025-09-07T07:34:42.4845771Z ^^^^^^^ 2025-09-07T07:34:42.4845911Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4845975Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4846016Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4846183Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4846224Z return func(*args, **kwargs) 2025-09-07T07:34:42.4846259Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4847723Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4847812Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4847815Z 2025-09-07T07:34:42.4848021Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4848065Z 2025-09-07T07:34:42.4848067Z 2025-09-07T07:34:42.4848140Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4848332Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.4848334Z 2025-09-07T07:34:42.4848421Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4848497Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4848531Z inline_call [] 2025-09-07T07:34:42.4848589Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4848622Z inductor [] 2025-09-07T07:34:42.4848695Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4848765Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4849026Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4849139Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4849215Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4849389Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4849475Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4849606Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4849725Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4849796Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4849854Z inline_call [] 2025-09-07T07:34:42.4849909Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4850973Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4851043Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4851325Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4851435Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4851509Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4851659Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4851749Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4851879Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4851997Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4852067Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4852102Z inline_call [] 2025-09-07T07:34:42.4852157Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4852228Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4852297Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4852552Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4852678Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4852751Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4852898Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4852982Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4853110Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4853228Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4854417Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-fcc9b8c4aa523be3.xml - 2025-09-07T07:34:42.4854476Z =========================== short test summary info ============================ 2025-09-07T07:34:42.4854840Z FAILED [0.2209s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4854924Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4854941Z 2025-09-07T07:34:42.4855148Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4855151Z 2025-09-07T07:34:42.4855153Z 2025-09-07T07:34:42.4855225Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4855416Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.4855438Z 2025-09-07T07:34:42.4855523Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4855581Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.4855647Z ================== 1 failed, 245 deselected, 2 rerun in 1.31s ================== 2025-09-07T07:34:42.4855681Z Got exit code 1 2025-09-07T07:34:42.4855720Z Retrying single test... 2025-09-07T07:34:42.4856159Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.4856199Z import pkg_resources 2025-09-07T07:34:42.4856368Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-c96e3ec2e81939b1.xml 2025-09-07T07:34:42.4856426Z ============================= test session starts ============================== 2025-09-07T07:34:42.4856613Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.4856652Z cachedir: .pytest_cache 2025-09-07T07:34:42.4856808Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.4856853Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.4856890Z configfile: pytest.ini 2025-09-07T07:34:42.4858044Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.4858121Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.4858349Z stepcurrent: skipping 62 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.4858424Z Running 1 items in this shard 2025-09-07T07:34:42.4858426Z 2025-09-07T07:34:42.4858623Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.6177s] [100%] 2025-09-07T07:34:42.4858815Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.2600s] [100%] 2025-09-07T07:34:42.4858984Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True FAILED [0.2539s] [100%] 2025-09-07T07:34:42.4858986Z 2025-09-07T07:34:42.4859034Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.4859140Z _ WhileLoopTests.test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4859182Z Traceback (most recent call last): 2025-09-07T07:34:42.4859335Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1267, in test_while_loop_with_outer_code 2025-09-07T07:34:42.4859370Z self._run_test( 2025-09-07T07:34:42.4859484Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4859538Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4859579Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4859734Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4859782Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4859821Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4859972Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4860019Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4860058Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4860218Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4861238Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4861276Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4861420Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4861502Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4861564Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4861717Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4861763Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4861913Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4861969Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4862011Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4862154Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4862204Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4862242Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4862360Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4862426Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4862470Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4862596Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4862659Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4862717Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4862858Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4862901Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4863904Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4864043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4864083Z return aot_autograd( 2025-09-07T07:34:42.4864117Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4864254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4864323Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4864368Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4864527Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4864614Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4864659Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4864848Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4864939Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4865127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4865166Z fx_g = _create_graph( 2025-09-07T07:34:42.4865201Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4865363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4865397Z fx_g = make_fx( 2025-09-07T07:34:42.4865431Z ^^^^^^^^ 2025-09-07T07:34:42.4865599Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4865643Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4865680Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4865827Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4866916Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4866981Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4867142Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4867180Z t = dispatch_trace( 2025-09-07T07:34:42.4867213Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4867326Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4867369Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4867408Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4867533Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4867575Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4867610Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4867773Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4867852Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4867893Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4868017Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4868055Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4868089Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4868216Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4868279Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4868314Z ^^^^^^^^^ 2025-09-07T07:34:42.4868445Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4869462Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4869499Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4869650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4869699Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4869733Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4869889Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4869950Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4869997Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4870174Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4870213Z outs_pair = fn(*args) 2025-09-07T07:34:42.4870247Z ^^^^^^^^^ 2025-09-07T07:34:42.4870439Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4870507Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4870552Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4870724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4870763Z outs_pair = fn(*args) 2025-09-07T07:34:42.4870797Z ^^^^^^^^^ 2025-09-07T07:34:42.4870978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4871060Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4871102Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4871300Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4871386Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4872402Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4872576Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4872614Z outs_pair = fn(*args) 2025-09-07T07:34:42.4872649Z ^^^^^^^^^ 2025-09-07T07:34:42.4872839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4872886Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4872922Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4873090Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4873138Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4873175Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4873300Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4873343Z return handle_torch_function( 2025-09-07T07:34:42.4873378Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4873519Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4873612Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4873660Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4873826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4873866Z return func(*args, **kwargs) 2025-09-07T07:34:42.4873902Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4874029Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4874071Z result = _engine_run_backward( 2025-09-07T07:34:42.4874105Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4875211Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4875332Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4875384Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4875511Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4875553Z return user_fn(self, *args) 2025-09-07T07:34:42.4875588Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4875754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4875798Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4875834Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4875992Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4876036Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4876072Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4876195Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4876249Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4876286Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4876449Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4876578Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4876618Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4876780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4876829Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4876867Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4878009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4878061Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4878102Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4878263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4878300Z t = dispatch_trace( 2025-09-07T07:34:42.4878333Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4878448Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4878490Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4878528Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4878650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4878689Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4878722Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4878882Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4878986Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4879027Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4879149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4879187Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4879221Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4879349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4879389Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4879423Z ^^^^^^^^^ 2025-09-07T07:34:42.4879572Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4880672Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4880706Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4880751Z File "", line 1, in 2025-09-07T07:34:42.4880894Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4880972Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4881018Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4881182Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4881231Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4881268Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4881460Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4881502Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4881538Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4881709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4881775Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4881811Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4881955Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4881997Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4882033Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4882180Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4882269Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4882314Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4883409Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4883473Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4883516Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4883641Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4883680Z leaves = list(leaves) 2025-09-07T07:34:42.4883714Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4883840Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4883875Z return func(x) 2025-09-07T07:34:42.4883907Z ^^^^^^^ 2025-09-07T07:34:42.4884045Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4884110Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4884151Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4884338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4884379Z return func(*args, **kwargs) 2025-09-07T07:34:42.4884414Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4884594Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4884680Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4884683Z 2025-09-07T07:34:42.4884889Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4884892Z 2025-09-07T07:34:42.4884894Z 2025-09-07T07:34:42.4884966Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4885159Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.4885164Z 2025-09-07T07:34:42.4885250Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4885323Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4886326Z inline_call [] 2025-09-07T07:34:42.4886385Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4886437Z inductor [] 2025-09-07T07:34:42.4886577Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4886649Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4886908Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4887019Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4887119Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4887269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4887354Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4887488Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4887629Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4887736Z _ WhileLoopTests.test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4887779Z Traceback (most recent call last): 2025-09-07T07:34:42.4887926Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1267, in test_while_loop_with_outer_code 2025-09-07T07:34:42.4887967Z self._run_test( 2025-09-07T07:34:42.4888080Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4888135Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4888175Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4888307Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4888354Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4889383Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4889536Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4889582Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4889620Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4889755Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4889827Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4889865Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4890008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4890089Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4890126Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4890279Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4890324Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4890474Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4890526Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4890567Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4890710Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4890763Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4890801Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4890917Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4891003Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4891051Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4891176Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4892235Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4892277Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4892416Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4892481Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4892519Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4892655Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4892694Z return aot_autograd( 2025-09-07T07:34:42.4892729Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4892865Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4892948Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4892993Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4893155Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4893237Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4893287Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4893473Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4893516Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4893704Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4893744Z fx_g = _create_graph( 2025-09-07T07:34:42.4893779Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4893943Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4893977Z fx_g = make_fx( 2025-09-07T07:34:42.4894969Z ^^^^^^^^ 2025-09-07T07:34:42.4895121Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4895196Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4895233Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4895378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4895420Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4895457Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4895616Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4895654Z t = dispatch_trace( 2025-09-07T07:34:42.4895687Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4895800Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4895840Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4895875Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4896003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4896043Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4896079Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4896239Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4896330Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4896371Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4896568Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4896606Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4896641Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4897751Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4897793Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4897860Z ^^^^^^^^^ 2025-09-07T07:34:42.4897994Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4898034Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4898068Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4898217Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4898267Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4898319Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4898475Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4898536Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4898580Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4898754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4898795Z outs_pair = fn(*args) 2025-09-07T07:34:42.4898829Z ^^^^^^^^^ 2025-09-07T07:34:42.4899000Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4899065Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4899109Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4899283Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4899321Z outs_pair = fn(*args) 2025-09-07T07:34:42.4899356Z ^^^^^^^^^ 2025-09-07T07:34:42.4899535Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4900589Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4900631Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4900826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4900895Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4900941Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4901113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4901151Z outs_pair = fn(*args) 2025-09-07T07:34:42.4901185Z ^^^^^^^^^ 2025-09-07T07:34:42.4901373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4901419Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4901458Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4901625Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4901670Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4901706Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4901852Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4901896Z return handle_torch_function( 2025-09-07T07:34:42.4901932Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4902072Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4902147Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4902191Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4902375Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4903383Z return func(*args, **kwargs) 2025-09-07T07:34:42.4903421Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4903545Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4903588Z result = _engine_run_backward( 2025-09-07T07:34:42.4903622Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4903785Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4903905Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4903954Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4904083Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4904127Z return user_fn(self, *args) 2025-09-07T07:34:42.4904162Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4904305Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4904349Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4904385Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4904542Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4904585Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4904622Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4904745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4904784Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4904834Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4905001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4905051Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4906055Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4906193Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4906242Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4906280Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4906443Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4906634Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4906674Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4906833Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4906876Z t = dispatch_trace( 2025-09-07T07:34:42.4906910Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4907022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4907064Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4907101Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4907253Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4907292Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4907327Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4907486Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4907564Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4907606Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4907752Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4907789Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4908820Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4908948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4908991Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4909024Z ^^^^^^^^^ 2025-09-07T07:34:42.4909197Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4909246Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4909279Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4909320Z File "", line 1, in 2025-09-07T07:34:42.4909463Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4909543Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4909589Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4909728Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4909776Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4909815Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4910006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4910049Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4910084Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4910254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4910319Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4910355Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4910497Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4910540Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4911542Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4911677Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4911766Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4911811Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4911934Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4911993Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4912038Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4912165Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4912202Z leaves = list(leaves) 2025-09-07T07:34:42.4912236Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4912357Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4912413Z return func(x) 2025-09-07T07:34:42.4912446Z ^^^^^^^ 2025-09-07T07:34:42.4912587Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4912651Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4912692Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4912859Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4912901Z return func(*args, **kwargs) 2025-09-07T07:34:42.4912953Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4913135Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4913219Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4913222Z 2025-09-07T07:34:42.4914440Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4914443Z 2025-09-07T07:34:42.4914445Z 2025-09-07T07:34:42.4914518Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4914710Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.4914713Z 2025-09-07T07:34:42.4914798Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4914875Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4914909Z inline_call [] 2025-09-07T07:34:42.4914965Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4914999Z inductor [] 2025-09-07T07:34:42.4915073Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4915144Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4915405Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4915516Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4915594Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4915763Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4915851Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4915981Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4916102Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4916172Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4916207Z inline_call [] 2025-09-07T07:34:42.4916262Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4916334Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4916403Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4917717Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4917832Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4917906Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4918080Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4918166Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4918295Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4918412Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4918461Z =================================== FAILURES =================================== 2025-09-07T07:34:42.4918589Z _ WhileLoopTests.test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4918632Z Traceback (most recent call last): 2025-09-07T07:34:42.4918778Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1267, in test_while_loop_with_outer_code 2025-09-07T07:34:42.4918812Z self._run_test( 2025-09-07T07:34:42.4918927Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4918983Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4919045Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4919178Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4919224Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4919264Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4919413Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4919462Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4919500Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4920709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4920754Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4920794Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4920938Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4921019Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4921056Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4921208Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4921279Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4921431Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4921483Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4921523Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4921665Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4921716Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4921754Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4921870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4921936Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4921983Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4922110Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4922175Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4922216Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4922355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4922398Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4923426Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4923566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4923606Z return aot_autograd( 2025-09-07T07:34:42.4923640Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4923775Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4923844Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4923908Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4924069Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4924150Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4924197Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4924392Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4924436Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4924621Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4924660Z fx_g = _create_graph( 2025-09-07T07:34:42.4924694Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4924859Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4924893Z fx_g = make_fx( 2025-09-07T07:34:42.4924926Z ^^^^^^^^ 2025-09-07T07:34:42.4925078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4925125Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4925163Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4926280Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4926324Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4926360Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4926592Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4926660Z t = dispatch_trace( 2025-09-07T07:34:42.4926696Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4926810Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4926850Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4926885Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4927009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4927049Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4927085Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4927248Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4927326Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4927366Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4927490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4927532Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4927567Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4927692Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4927733Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4927766Z ^^^^^^^^^ 2025-09-07T07:34:42.4927931Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4928954Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4928991Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4929141Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4929191Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4929224Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4929380Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4929469Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4929512Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4929686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4929726Z outs_pair = fn(*args) 2025-09-07T07:34:42.4929760Z ^^^^^^^^^ 2025-09-07T07:34:42.4929955Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4930021Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4930066Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4930238Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4930279Z outs_pair = fn(*args) 2025-09-07T07:34:42.4930312Z ^^^^^^^^^ 2025-09-07T07:34:42.4930489Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4930548Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4930591Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4930786Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4931819Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4931866Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4932038Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4932098Z outs_pair = fn(*args) 2025-09-07T07:34:42.4932132Z ^^^^^^^^^ 2025-09-07T07:34:42.4932326Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4932370Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4932408Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4932577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4932623Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4932659Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4932784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4932826Z return handle_torch_function( 2025-09-07T07:34:42.4932864Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4933006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4933081Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4933125Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4933306Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4933348Z return func(*args, **kwargs) 2025-09-07T07:34:42.4933384Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4933506Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4933548Z result = _engine_run_backward( 2025-09-07T07:34:42.4934550Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4934697Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4934844Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4934894Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4935020Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4935062Z return user_fn(self, *args) 2025-09-07T07:34:42.4935097Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4935263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4935308Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4935343Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4935501Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4935547Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4935583Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4935708Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4935747Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4935781Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4935948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4935999Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4936038Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4936173Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4936222Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4936259Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4937488Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4937536Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4937574Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4937732Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4937771Z t = dispatch_trace( 2025-09-07T07:34:42.4937805Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4937919Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4937960Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4937996Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4938118Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4938157Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4938194Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4938359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4938438Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4938479Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4938632Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4938672Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4938706Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4938832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4938873Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4938906Z ^^^^^^^^^ 2025-09-07T07:34:42.4940029Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4940112Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4940146Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4940186Z File "", line 1, in 2025-09-07T07:34:42.4940333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4940410Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4940480Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4940617Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4940664Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4940701Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4940895Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4940939Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4940974Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4941146Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4941189Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4941227Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4941374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4941415Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4941449Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4941583Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4941669Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4941739Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4942839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4942900Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4942942Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4943069Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4943107Z leaves = list(leaves) 2025-09-07T07:34:42.4943142Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4943265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4943299Z return func(x) 2025-09-07T07:34:42.4943331Z ^^^^^^^ 2025-09-07T07:34:42.4943467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4943537Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4943578Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4943745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4943785Z return func(*args, **kwargs) 2025-09-07T07:34:42.4943820Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4944021Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4944106Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4944108Z 2025-09-07T07:34:42.4944315Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4944318Z 2025-09-07T07:34:42.4944320Z 2025-09-07T07:34:42.4944409Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4944602Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.4944605Z 2025-09-07T07:34:42.4944689Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4945735Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4945771Z inline_call [] 2025-09-07T07:34:42.4945849Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4945884Z inductor [] 2025-09-07T07:34:42.4945958Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4946028Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4946287Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4946400Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4946475Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4946698Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4946785Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4946915Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4947034Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4947104Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4947168Z inline_call [] 2025-09-07T07:34:42.4947224Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4947296Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4947365Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4947623Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4947733Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4947807Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4947956Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4949023Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4949155Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4949277Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4949346Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4949381Z inline_call [] 2025-09-07T07:34:42.4949460Z stats [('calls_captured', 13), ('unique_graphs', 1)] 2025-09-07T07:34:42.4949532Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4949602Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4949855Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4949962Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 838, in forward 2025-09-07T07:34:42.4950059Z _, f, g = torch._higher_order_ops.while_loop(cond_fn, body_fn, [c, d, e]) 2025-09-07T07:34:42.4950207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4950290Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4950418Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4950560Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4950779Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-c96e3ec2e81939b1.xml - 2025-09-07T07:34:42.4950837Z =========================== short test summary info ============================ 2025-09-07T07:34:42.4951199Z FAILED [0.2539s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4951285Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4951288Z 2025-09-07T07:34:42.4951495Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4951497Z 2025-09-07T07:34:42.4951500Z 2025-09-07T07:34:42.4951571Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4952733Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.4952735Z 2025-09-07T07:34:42.4952821Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4952909Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.4952975Z ================== 1 failed, 245 deselected, 2 rerun in 1.35s ================== 2025-09-07T07:34:42.4953010Z Got exit code 1 2025-09-07T07:34:42.4953134Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-09-07T07:34:42.4953558Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.4953597Z import pkg_resources 2025-09-07T07:34:42.4953768Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-4ba2ae02f18ca649.xml 2025-09-07T07:34:42.4953828Z ============================= test session starts ============================== 2025-09-07T07:34:42.4953946Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.4953986Z cachedir: .pytest_cache 2025-09-07T07:34:42.4954144Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.4954188Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.4954248Z configfile: pytest.ini 2025-09-07T07:34:42.4954411Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.4954487Z collecting ... collected 467 items / 63 deselected / 404 selected 2025-09-07T07:34:42.4954538Z stepcurrent: skipping 63 already run items. 2025-09-07T07:34:42.4954578Z Running 183 items in this shard 2025-09-07T07:34:42.4954581Z 2025-09-07T07:34:42.4954755Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_True_autograd_False PASSED [2.9845s] [ 0%] 2025-09-07T07:34:42.4954968Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.5738s] [ 1%] 2025-09-07T07:34:42.4956134Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.2230s] [ 1%] 2025-09-07T07:34:42.4956317Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True FAILED [0.2248s] [ 1%] 2025-09-07T07:34:42.4956319Z 2025-09-07T07:34:42.4956368Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.4956475Z _ WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4956599Z Traceback (most recent call last): 2025-09-07T07:34:42.4956747Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.4956787Z self._run_test( 2025-09-07T07:34:42.4956898Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4956956Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4956996Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4957134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4957180Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4957219Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4957372Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4957418Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4957455Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4957620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4957665Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4957702Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4957844Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4957925Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4957965Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4959104Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4959152Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4959303Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4959356Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4959399Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4959542Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4959592Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4959631Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4959774Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4959843Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4959887Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4960014Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4960077Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4960119Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4960337Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4960382Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4960418Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4960558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4960598Z return aot_autograd( 2025-09-07T07:34:42.4960633Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4960786Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4961832Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4961878Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4962039Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4962125Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4962170Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4962350Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4962395Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4962583Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4962622Z fx_g = _create_graph( 2025-09-07T07:34:42.4962657Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4962821Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4962855Z fx_g = make_fx( 2025-09-07T07:34:42.4962908Z ^^^^^^^^ 2025-09-07T07:34:42.4963062Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4963107Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4963145Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4963291Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4963336Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4963372Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4963531Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4963568Z t = dispatch_trace( 2025-09-07T07:34:42.4963602Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4964681Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4964726Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4964762Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4964886Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4964925Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4964961Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4965165Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4965246Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4965287Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4965410Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4965447Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4965482Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4965608Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4965668Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4965701Z ^^^^^^^^^ 2025-09-07T07:34:42.4965832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4965873Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4965907Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4966070Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4966121Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4966155Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4966312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4967523Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4967570Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4967748Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4967786Z outs_pair = fn(*args) 2025-09-07T07:34:42.4967822Z ^^^^^^^^^ 2025-09-07T07:34:42.4967995Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4968063Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4968106Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4968278Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4968316Z outs_pair = fn(*args) 2025-09-07T07:34:42.4968350Z ^^^^^^^^^ 2025-09-07T07:34:42.4968528Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4968618Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4968661Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4968856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4968927Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4968973Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4969143Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4969181Z outs_pair = fn(*args) 2025-09-07T07:34:42.4969215Z ^^^^^^^^^ 2025-09-07T07:34:42.4969403Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4970423Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4970461Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4970630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4970705Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4970742Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4970869Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4970911Z return handle_torch_function( 2025-09-07T07:34:42.4970946Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4971088Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4971163Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4971237Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4971405Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4971446Z return func(*args, **kwargs) 2025-09-07T07:34:42.4971481Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4971606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.4971665Z result = _engine_run_backward( 2025-09-07T07:34:42.4971702Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4971848Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.4971969Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4972018Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4972147Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.4972187Z return user_fn(self, *args) 2025-09-07T07:34:42.4973193Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4973341Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.4973385Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.4973422Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4973584Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.4973628Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.4973665Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4973788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4973856Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4973891Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4974056Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.4974107Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.4974147Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4974285Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.4974333Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.4974371Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4974532Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.4974579Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.4974620Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4974778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4974815Z t = dispatch_trace( 2025-09-07T07:34:42.4975914Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4976032Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4976094Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4976131Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4976255Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4976293Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4976328Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4976551Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4976659Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4976699Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4976824Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4976861Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4976896Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4977024Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4977084Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4977118Z ^^^^^^^^^ 2025-09-07T07:34:42.4977268Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4977316Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4977349Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4977390Z File "", line 1, in 2025-09-07T07:34:42.4977537Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.4977615Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.4978722Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4978863Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.4978910Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.4978949Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4979141Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4979184Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4979219Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4979422Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.4979468Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.4979505Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4979647Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.4979690Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.4979726Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4979863Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.4979951Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.4979998Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4980122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.4980184Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.4980226Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4980351Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.4980389Z leaves = list(leaves) 2025-09-07T07:34:42.4981396Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.4981546Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.4981584Z return func(x) 2025-09-07T07:34:42.4981616Z ^^^^^^^ 2025-09-07T07:34:42.4981754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.4981819Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.4981860Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4982029Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.4982084Z return func(*args, **kwargs) 2025-09-07T07:34:42.4982120Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4982300Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.4982386Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.4982389Z 2025-09-07T07:34:42.4982613Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.4982617Z 2025-09-07T07:34:42.4982619Z 2025-09-07T07:34:42.4982691Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.4982886Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.4982890Z 2025-09-07T07:34:42.4982975Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.4983051Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.4983085Z inline_call [] 2025-09-07T07:34:42.4983139Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.4983213Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.4983286Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.4983543Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.4984628Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.4984699Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.4984855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.4984939Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.4985071Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.4985191Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.4985299Z _ WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.4985341Z Traceback (most recent call last): 2025-09-07T07:34:42.4985486Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.4985521Z self._run_test( 2025-09-07T07:34:42.4985634Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.4985693Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.4985732Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4985864Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.4985910Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.4985965Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4986117Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.4986164Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.4986202Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4986337Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.4986381Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.4987474Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4987650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.4987731Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.4987768Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4987921Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.4987965Z raise BackendCompilerFailed( 2025-09-07T07:34:42.4988135Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.4988189Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4988228Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4988368Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.4988425Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.4988462Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4988578Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.4988643Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.4988689Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4988817Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.4988880Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.4988923Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4989065Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.4989109Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.4989171Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4989308Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.4990318Z return aot_autograd( 2025-09-07T07:34:42.4990355Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.4990492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.4990562Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.4990607Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4990770Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.4990851Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.4990896Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4991082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.4991125Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.4991310Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.4991369Z fx_g = _create_graph( 2025-09-07T07:34:42.4991405Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4991570Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.4991604Z fx_g = make_fx( 2025-09-07T07:34:42.4991636Z ^^^^^^^^ 2025-09-07T07:34:42.4991787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.4991833Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.4991890Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4992037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.4992079Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.4993081Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4993241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.4993279Z t = dispatch_trace( 2025-09-07T07:34:42.4993333Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4993447Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.4993488Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.4993523Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4993649Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4993691Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4993727Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4993886Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.4993966Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.4994005Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4994132Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.4994170Z return fn(*args, **kwargs) 2025-09-07T07:34:42.4994204Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4994330Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.4994372Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.4994405Z ^^^^^^^^^ 2025-09-07T07:34:42.4994555Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.4994596Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.4994631Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4995745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.4995796Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.4995830Z ^^^^^^^^^^^ 2025-09-07T07:34:42.4995989Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.4996050Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.4996094Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4996269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4996311Z outs_pair = fn(*args) 2025-09-07T07:34:42.4996345Z ^^^^^^^^^ 2025-09-07T07:34:42.4996580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.4996648Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.4996691Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4996890Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4996929Z outs_pair = fn(*args) 2025-09-07T07:34:42.4996964Z ^^^^^^^^^ 2025-09-07T07:34:42.4997141Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.4997201Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.4997245Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4997460Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.4997530Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.4997575Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4997763Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.4998787Z outs_pair = fn(*args) 2025-09-07T07:34:42.4998820Z ^^^^^^^^^ 2025-09-07T07:34:42.4999011Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.4999054Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.4999092Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4999264Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.4999310Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.4999347Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4999472Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.4999516Z return handle_torch_function( 2025-09-07T07:34:42.4999555Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4999696Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.4999771Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.4999815Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.4999981Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5000047Z return func(*args, **kwargs) 2025-09-07T07:34:42.5000082Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5000329Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5000371Z result = _engine_run_backward( 2025-09-07T07:34:42.5000406Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5000553Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5001670Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5001720Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5001845Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5001891Z return user_fn(self, *args) 2025-09-07T07:34:42.5001926Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5002070Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5002114Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5002149Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5002327Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5002372Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5002409Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5002532Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5002572Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5002606Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5002773Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5002841Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5002881Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5003017Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5003067Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5003104Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5003285Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5003332Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5004343Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5004503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5004544Z t = dispatch_trace( 2025-09-07T07:34:42.5004578Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5004691Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5004733Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5004769Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5004893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5004932Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5004968Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5005128Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5005207Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5005247Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5005371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5005432Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5005467Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5005593Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5005634Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5005667Z ^^^^^^^^^ 2025-09-07T07:34:42.5005817Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5005866Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5006923Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5006966Z File "", line 1, in 2025-09-07T07:34:42.5007110Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5007186Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5007235Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5007369Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5007417Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5007454Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5007672Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5007715Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5007750Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5007920Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5007965Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5008002Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5008164Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5008207Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5008243Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5008377Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5008466Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5008527Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5008652Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5008712Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5009733Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5009862Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5009902Z leaves = list(leaves) 2025-09-07T07:34:42.5009937Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5010059Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5010094Z return func(x) 2025-09-07T07:34:42.5010126Z ^^^^^^^ 2025-09-07T07:34:42.5010265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5010329Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5010370Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5010537Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5010577Z return func(*args, **kwargs) 2025-09-07T07:34:42.5010635Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5010816Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5010900Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5010902Z 2025-09-07T07:34:42.5011110Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5011112Z 2025-09-07T07:34:42.5011114Z 2025-09-07T07:34:42.5011187Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5011382Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.5011385Z 2025-09-07T07:34:42.5011469Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5011546Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5011581Z inline_call [] 2025-09-07T07:34:42.5012602Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5012678Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5012749Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5013031Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5013144Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5013195Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5013347Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5013435Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5013585Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5013704Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5013775Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5013811Z inline_call [] 2025-09-07T07:34:42.5013865Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5013950Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5014020Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5014274Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5014385Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5014435Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5014585Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5014669Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5014800Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5014918Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5015938Z =================================== FAILURES =================================== 2025-09-07T07:34:42.5016045Z _ WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.5016089Z Traceback (most recent call last): 2025-09-07T07:34:42.5016255Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5016293Z self._run_test( 2025-09-07T07:34:42.5016404Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5016458Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5016561Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5016696Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5016742Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5016780Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5016931Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5016976Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5017015Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5017154Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5017199Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5017235Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5017378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5017482Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5017523Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5017678Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5018703Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5018854Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5018908Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5018973Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5019116Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5019165Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5019204Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5019320Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5019405Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5019449Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5019577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5019640Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5019685Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5019827Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5019871Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5019908Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5020045Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5020085Z return aot_autograd( 2025-09-07T07:34:42.5020120Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5020257Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5020326Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5020371Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5021502Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5021616Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5021660Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5021845Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5021888Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5022076Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5022115Z fx_g = _create_graph( 2025-09-07T07:34:42.5022150Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5022312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5022347Z fx_g = make_fx( 2025-09-07T07:34:42.5022382Z ^^^^^^^^ 2025-09-07T07:34:42.5022533Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5022578Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5022615Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5022776Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5022819Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5022857Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5023015Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5023053Z t = dispatch_trace( 2025-09-07T07:34:42.5023087Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5023200Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5024206Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5024261Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5024387Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5024426Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5024460Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5024626Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5024721Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5024763Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5024888Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5024927Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5024960Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5025089Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5025130Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5025164Z ^^^^^^^^^ 2025-09-07T07:34:42.5025295Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5025337Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5025374Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5025524Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5025572Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5025606Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5025762Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5025824Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5026912Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5027093Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5027131Z outs_pair = fn(*args) 2025-09-07T07:34:42.5027165Z ^^^^^^^^^ 2025-09-07T07:34:42.5027339Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5027407Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5027450Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5027626Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5027663Z outs_pair = fn(*args) 2025-09-07T07:34:42.5027698Z ^^^^^^^^^ 2025-09-07T07:34:42.5027879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5027938Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5027981Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5028200Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5028273Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5028317Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5028491Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5028529Z outs_pair = fn(*args) 2025-09-07T07:34:42.5028563Z ^^^^^^^^^ 2025-09-07T07:34:42.5028753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5028819Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5028855Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5030001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5030048Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5030085Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5030252Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5030297Z return handle_torch_function( 2025-09-07T07:34:42.5030332Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5030473Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5030548Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5030596Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5030764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5030805Z return func(*args, **kwargs) 2025-09-07T07:34:42.5030839Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5030966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5031007Z result = _engine_run_backward( 2025-09-07T07:34:42.5031043Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5031188Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5031309Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5031380Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5031508Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5031549Z return user_fn(self, *args) 2025-09-07T07:34:42.5031584Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5032696Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5032742Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5032778Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5032934Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5032979Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5033015Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5033138Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5033181Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5033216Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5033381Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5033434Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5033495Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5033634Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5033683Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5033721Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5033883Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5033933Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5033987Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5034146Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5034183Z t = dispatch_trace( 2025-09-07T07:34:42.5034217Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5034331Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5035344Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5035401Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5035526Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5035564Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5035599Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5035759Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5035841Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5035881Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5036004Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5036043Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5036076Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5036205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5036246Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5036280Z ^^^^^^^^^ 2025-09-07T07:34:42.5036429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5036548Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5036581Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5036648Z File "", line 1, in 2025-09-07T07:34:42.5036794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5036872Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5036916Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5038037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5038087Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5038126Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5038318Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5038362Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5038396Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5038571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5038616Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5038652Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5038795Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5038860Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5038896Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5039029Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5039117Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5039161Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5039287Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5039371Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5039415Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5039541Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5039580Z leaves = list(leaves) 2025-09-07T07:34:42.5039613Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5039760Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5040891Z return func(x) 2025-09-07T07:34:42.5040926Z ^^^^^^^ 2025-09-07T07:34:42.5041064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5041129Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5041170Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5041337Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5041380Z return func(*args, **kwargs) 2025-09-07T07:34:42.5041415Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5041594Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5041680Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5041683Z 2025-09-07T07:34:42.5041890Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5041892Z 2025-09-07T07:34:42.5041895Z 2025-09-07T07:34:42.5041968Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5042161Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.5042185Z 2025-09-07T07:34:42.5042270Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5042344Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5042380Z inline_call [] 2025-09-07T07:34:42.5042433Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5042507Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5042579Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5042837Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5042948Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5043976Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5044132Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5044219Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5044349Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5044488Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5044561Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5044596Z inline_call [] 2025-09-07T07:34:42.5044649Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5044721Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5044790Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5045060Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5045170Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5045221Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5045371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5045625Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5045755Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5045873Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5045943Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5045980Z inline_call [] 2025-09-07T07:34:42.5046032Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5046104Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5046173Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5047493Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5047606Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5047656Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5047803Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5047888Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5048057Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5048175Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5048394Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-4ba2ae02f18ca649.xml - 2025-09-07T07:34:42.5048454Z =========================== short test summary info ============================ 2025-09-07T07:34:42.5048818Z FAILED [0.2248s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5048902Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5048906Z 2025-09-07T07:34:42.5049114Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5049116Z 2025-09-07T07:34:42.5049118Z 2025-09-07T07:34:42.5049189Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5049409Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.5049412Z 2025-09-07T07:34:42.5049498Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5049558Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.5049628Z ============= 1 failed, 1 passed, 63 deselected, 2 rerun in 4.29s ============== 2025-09-07T07:34:42.5049662Z Got exit code 1 2025-09-07T07:34:42.5049701Z Retrying single test... 2025-09-07T07:34:42.5050125Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.5050194Z import pkg_resources 2025-09-07T07:34:42.5051348Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-1fd6dbc5960f91c5.xml 2025-09-07T07:34:42.5051430Z ============================= test session starts ============================== 2025-09-07T07:34:42.5051545Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.5051585Z cachedir: .pytest_cache 2025-09-07T07:34:42.5051742Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.5051788Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.5051828Z configfile: pytest.ini 2025-09-07T07:34:42.5051991Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.5052067Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.5052298Z stepcurrent: skipping 64 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.5052340Z Running 1 items in this shard 2025-09-07T07:34:42.5052342Z 2025-09-07T07:34:42.5052539Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.5476s] [100%] 2025-09-07T07:34:42.5052731Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.2442s] [100%] 2025-09-07T07:34:42.5052926Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True FAILED [0.2289s] [100%] 2025-09-07T07:34:42.5052928Z 2025-09-07T07:34:42.5052977Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.5053084Z _ WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.5053127Z Traceback (most recent call last): 2025-09-07T07:34:42.5053279Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5053314Z self._run_test( 2025-09-07T07:34:42.5053427Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5053481Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5054495Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5054631Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5054680Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5054718Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5054871Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5054916Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5054972Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5055111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5055156Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5055192Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5055335Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5055419Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5055471Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5055624Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5055670Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5055823Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5055877Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5055936Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5056079Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5056131Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5056169Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5056286Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5057436Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5057483Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5057610Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5057675Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5057716Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5057858Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5057902Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5057940Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5058078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5058155Z return aot_autograd( 2025-09-07T07:34:42.5058193Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5058329Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5058398Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5058443Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5058605Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5058689Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5058734Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5058917Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5058962Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5059151Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5059190Z fx_g = _create_graph( 2025-09-07T07:34:42.5060209Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5060374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5060441Z fx_g = make_fx( 2025-09-07T07:34:42.5060474Z ^^^^^^^^ 2025-09-07T07:34:42.5060628Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5060673Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5060711Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5060859Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5060931Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5060968Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5061127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5061165Z t = dispatch_trace( 2025-09-07T07:34:42.5061198Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5061313Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5061354Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5061409Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5061534Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5061575Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5061610Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5061773Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5061852Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5061894Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5062994Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5063036Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5063074Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5063205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5063246Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5063280Z ^^^^^^^^^ 2025-09-07T07:34:42.5063414Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5063456Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5063490Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5063660Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5063710Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5063743Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5063901Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5063962Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5064007Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5064182Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5064222Z outs_pair = fn(*args) 2025-09-07T07:34:42.5064256Z ^^^^^^^^^ 2025-09-07T07:34:42.5064428Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5064495Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5064539Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5064712Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5065727Z outs_pair = fn(*args) 2025-09-07T07:34:42.5065781Z ^^^^^^^^^ 2025-09-07T07:34:42.5065961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5066021Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5066063Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5066259Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5066348Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5066393Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5066643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5066681Z outs_pair = fn(*args) 2025-09-07T07:34:42.5066717Z ^^^^^^^^^ 2025-09-07T07:34:42.5066930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5066976Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5067012Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5067182Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5067229Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5067267Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5067392Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5067433Z return handle_torch_function( 2025-09-07T07:34:42.5067470Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5067613Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5068670Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5068717Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5068885Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5068925Z return func(*args, **kwargs) 2025-09-07T07:34:42.5068961Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5069120Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5069165Z result = _engine_run_backward( 2025-09-07T07:34:42.5069199Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5069347Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5069471Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5069522Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5069651Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5069692Z return user_fn(self, *args) 2025-09-07T07:34:42.5069727Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5069873Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5069920Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5069957Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5070115Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5070159Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5070195Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5070341Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5070381Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5071391Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5071558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5071610Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5071650Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5071820Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5071869Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5071907Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5072069Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5072116Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5072177Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5072336Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5072375Z t = dispatch_trace( 2025-09-07T07:34:42.5072408Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5072522Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5072566Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5072603Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5072726Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5072765Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5072800Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5072964Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5073042Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5074057Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5074181Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5074219Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5074253Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5074402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5074445Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5074479Z ^^^^^^^^^ 2025-09-07T07:34:42.5074627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5074677Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5074710Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5074751Z File "", line 1, in 2025-09-07T07:34:42.5074895Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5074973Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5075017Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5075152Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5075203Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5075240Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5075436Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5075479Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5075528Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5075700Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5075745Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5076835Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5076981Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5077023Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5077087Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5077222Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5077311Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5077355Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5077483Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5077561Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5077606Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5077732Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5077771Z leaves = list(leaves) 2025-09-07T07:34:42.5077805Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5077930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5077965Z return func(x) 2025-09-07T07:34:42.5077998Z ^^^^^^^ 2025-09-07T07:34:42.5078136Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5078200Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5078243Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5078411Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5078452Z return func(*args, **kwargs) 2025-09-07T07:34:42.5078487Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5079653Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5079738Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5079773Z 2025-09-07T07:34:42.5079979Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5079982Z 2025-09-07T07:34:42.5079984Z 2025-09-07T07:34:42.5080057Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5080345Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.5080348Z 2025-09-07T07:34:42.5080434Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5080507Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5080543Z inline_call [] 2025-09-07T07:34:42.5080596Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5080635Z inductor [] 2025-09-07T07:34:42.5080710Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5080781Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5081042Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5081173Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5081227Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5081378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5081464Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5081595Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5081737Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5081843Z _ WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.5081887Z Traceback (most recent call last): 2025-09-07T07:34:42.5083017Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5083056Z self._run_test( 2025-09-07T07:34:42.5083185Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5083241Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5083281Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5083413Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5083458Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5083503Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5083653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5083699Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5083737Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5083876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5083919Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5083958Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5084100Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5084182Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5084220Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5084390Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5084437Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5084587Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5084639Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5085650Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5085797Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5085849Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5085888Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5086006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5086070Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5086116Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5086244Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5086307Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5086349Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5086592Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5086639Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5086676Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5086815Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5086854Z return aot_autograd( 2025-09-07T07:34:42.5086889Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5087025Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5087124Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5087168Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5087329Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5087412Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5087478Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5088643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5088688Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5088877Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5088920Z fx_g = _create_graph( 2025-09-07T07:34:42.5088954Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5089118Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5089152Z fx_g = make_fx( 2025-09-07T07:34:42.5089185Z ^^^^^^^^ 2025-09-07T07:34:42.5089339Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5089386Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5089424Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5089571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5089613Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5089649Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5089808Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5089892Z t = dispatch_trace( 2025-09-07T07:34:42.5089926Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5090039Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5090081Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5090116Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5090243Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5090281Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5091293Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5091456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5091534Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5091577Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5091702Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5091740Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5091775Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5091902Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5091964Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5091999Z ^^^^^^^^^ 2025-09-07T07:34:42.5092133Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5092173Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5092208Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5092358Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5092408Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5092464Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5092620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5092684Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5092727Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5092918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5092957Z outs_pair = fn(*args) 2025-09-07T07:34:42.5093962Z ^^^^^^^^^ 2025-09-07T07:34:42.5094135Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5094202Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5094246Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5094425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5094463Z outs_pair = fn(*args) 2025-09-07T07:34:42.5094497Z ^^^^^^^^^ 2025-09-07T07:34:42.5094675Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5094736Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5094779Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5094976Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5095045Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5095091Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5095287Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5095326Z outs_pair = fn(*args) 2025-09-07T07:34:42.5095360Z ^^^^^^^^^ 2025-09-07T07:34:42.5095553Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5095598Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5095637Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5095806Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5095852Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5095888Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5097058Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5097105Z return handle_torch_function( 2025-09-07T07:34:42.5097142Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5097284Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5097358Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5097427Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5097597Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5097638Z return func(*args, **kwargs) 2025-09-07T07:34:42.5097673Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5097797Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5097838Z result = _engine_run_backward( 2025-09-07T07:34:42.5097875Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5098040Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5098162Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5098210Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5098357Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5098399Z return user_fn(self, *args) 2025-09-07T07:34:42.5098435Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5098579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5098622Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5098658Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5099799Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5099846Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5099884Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5100007Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5100049Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5100085Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5100252Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5100303Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5100343Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5100480Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5100554Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5100593Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5100755Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5100802Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5100841Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5101001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5101040Z t = dispatch_trace( 2025-09-07T07:34:42.5101074Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5101187Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5101229Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5101265Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5101391Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5102398Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5102434Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5102596Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5102673Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5102732Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5102859Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5102896Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5102931Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5103057Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5103098Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5103134Z ^^^^^^^^^ 2025-09-07T07:34:42.5103301Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5103350Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5103383Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5103424Z File "", line 1, in 2025-09-07T07:34:42.5103568Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5103667Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5103712Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5103847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5103895Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5103932Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5105100Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5105145Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5105181Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5105353Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5105397Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5105435Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5105578Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5105620Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5105655Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5105789Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5105909Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5105955Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5106079Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5106139Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5106182Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5106309Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5106347Z leaves = list(leaves) 2025-09-07T07:34:42.5106382Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5106582Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5106618Z return func(x) 2025-09-07T07:34:42.5106652Z ^^^^^^^ 2025-09-07T07:34:42.5106791Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5107843Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5107886Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5108081Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5108122Z return func(*args, **kwargs) 2025-09-07T07:34:42.5108159Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5108341Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5108424Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5108426Z 2025-09-07T07:34:42.5108633Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5108661Z 2025-09-07T07:34:42.5108663Z 2025-09-07T07:34:42.5108735Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5108928Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.5108931Z 2025-09-07T07:34:42.5109016Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5109108Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5109143Z inline_call [] 2025-09-07T07:34:42.5109198Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5109231Z inductor [] 2025-09-07T07:34:42.5109306Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5109377Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5109639Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5109751Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5109803Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5109955Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5111020Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5111152Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5111272Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5111369Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5111405Z inline_call [] 2025-09-07T07:34:42.5111460Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5111533Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5111603Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5111858Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5111967Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5112017Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5112166Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5112250Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5112383Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5112500Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5112551Z =================================== FAILURES =================================== 2025-09-07T07:34:42.5112684Z _ WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.5112730Z Traceback (most recent call last): 2025-09-07T07:34:42.5112878Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5112913Z self._run_test( 2025-09-07T07:34:42.5113025Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5113079Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5114094Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5114247Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5114293Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5114332Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5114486Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5114532Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5114589Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5114726Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5114769Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5114807Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5114948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5115031Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5115069Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5115221Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5115267Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5115418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5115471Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5115511Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5115653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5115703Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5115760Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5116923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5116991Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5117036Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5117164Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5117229Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5117270Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5117412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5117456Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5117493Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5117631Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5117673Z return aot_autograd( 2025-09-07T07:34:42.5117709Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5117844Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5117913Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5117986Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5118149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5118231Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5118276Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5118457Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5118522Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5118707Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5118747Z fx_g = _create_graph( 2025-09-07T07:34:42.5119765Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5119931Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5119989Z fx_g = make_fx( 2025-09-07T07:34:42.5120022Z ^^^^^^^^ 2025-09-07T07:34:42.5120250Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5120297Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5120334Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5120482Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5120527Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5120563Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5120725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5120764Z t = dispatch_trace( 2025-09-07T07:34:42.5120800Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5120914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5120956Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5120991Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5121116Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5121155Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5121191Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5121377Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5121455Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5121494Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5122608Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5122648Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5122685Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5122811Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5122852Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5122886Z ^^^^^^^^^ 2025-09-07T07:34:42.5123018Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5123060Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5123096Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5123244Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5123294Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5123327Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5123504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5123568Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5123612Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5123791Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5123829Z outs_pair = fn(*args) 2025-09-07T07:34:42.5123863Z ^^^^^^^^^ 2025-09-07T07:34:42.5124036Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5124120Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5124163Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5125307Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5125346Z outs_pair = fn(*args) 2025-09-07T07:34:42.5125402Z ^^^^^^^^^ 2025-09-07T07:34:42.5125580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5125640Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5125682Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5125876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5125948Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5125993Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5126167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5126205Z outs_pair = fn(*args) 2025-09-07T07:34:42.5126239Z ^^^^^^^^^ 2025-09-07T07:34:42.5126429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5126473Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5126600Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5126771Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5126843Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5126879Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5127006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5127047Z return handle_torch_function( 2025-09-07T07:34:42.5127083Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5127224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5128286Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5128331Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5128499Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5128542Z return func(*args, **kwargs) 2025-09-07T07:34:42.5128578Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5128702Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5128744Z result = _engine_run_backward( 2025-09-07T07:34:42.5128779Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5128950Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5129074Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5129122Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5129249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5129290Z return user_fn(self, *args) 2025-09-07T07:34:42.5129326Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5129493Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5129537Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5129573Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5129731Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5129775Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5129812Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5129955Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5130971Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5131007Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5131174Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5131226Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5131269Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5131405Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5131453Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5131491Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5131657Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5131703Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5131742Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5131900Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5131938Z t = dispatch_trace( 2025-09-07T07:34:42.5131971Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5132110Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5132155Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5132191Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5132315Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5132353Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5132389Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5132550Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5132628Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5133639Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5133766Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5133807Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5133844Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5133971Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5134012Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5134046Z ^^^^^^^^^ 2025-09-07T07:34:42.5134216Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5134265Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5134300Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5134342Z File "", line 1, in 2025-09-07T07:34:42.5134486Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5134563Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5134608Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5134763Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5134811Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5134848Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5135043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5135086Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5135135Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5135307Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5136319Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5136357Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5136576Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5136621Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5136656Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5136790Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5136878Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5136926Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5137052Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5137113Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5137155Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5137282Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5137347Z leaves = list(leaves) 2025-09-07T07:34:42.5137383Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5137505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5137541Z return func(x) 2025-09-07T07:34:42.5137573Z ^^^^^^^ 2025-09-07T07:34:42.5137712Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5137776Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5137818Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5137985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5138025Z return func(*args, **kwargs) 2025-09-07T07:34:42.5139043Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5139226Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5139313Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5139316Z 2025-09-07T07:34:42.5139523Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5139525Z 2025-09-07T07:34:42.5139527Z 2025-09-07T07:34:42.5139625Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5139820Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.5139823Z 2025-09-07T07:34:42.5139908Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5139982Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5140016Z inline_call [] 2025-09-07T07:34:42.5140091Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5140124Z inductor [] 2025-09-07T07:34:42.5140199Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5140269Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5140527Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5140657Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5140710Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5140860Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5140946Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5141077Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5141199Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5141268Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5142275Z inline_call [] 2025-09-07T07:34:42.5142331Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5142404Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5142474Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5142730Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5142838Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5142928Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5143077Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5143164Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5143294Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5143413Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5143482Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5143516Z inline_call [] 2025-09-07T07:34:42.5143569Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5143640Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5143708Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5143964Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5144073Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5144122Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5144285Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5144370Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5144499Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5145592Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5145814Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-1fd6dbc5960f91c5.xml - 2025-09-07T07:34:42.5145891Z =========================== short test summary info ============================ 2025-09-07T07:34:42.5146253Z FAILED [0.2289s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5146353Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5146355Z 2025-09-07T07:34:42.5146660Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5146663Z 2025-09-07T07:34:42.5146664Z 2025-09-07T07:34:42.5146736Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5146931Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.5146934Z 2025-09-07T07:34:42.5147017Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5147075Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.5147140Z ================== 1 failed, 245 deselected, 2 rerun in 1.35s ================== 2025-09-07T07:34:42.5147176Z Got exit code 1 2025-09-07T07:34:42.5147215Z Retrying single test... 2025-09-07T07:34:42.5147639Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.5147704Z import pkg_resources 2025-09-07T07:34:42.5147874Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-36aaf2bef84c93cb.xml 2025-09-07T07:34:42.5147930Z ============================= test session starts ============================== 2025-09-07T07:34:42.5148043Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.5148083Z cachedir: .pytest_cache 2025-09-07T07:34:42.5148240Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.5148284Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.5149313Z configfile: pytest.ini 2025-09-07T07:34:42.5149476Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.5149552Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.5149787Z stepcurrent: skipping 64 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.5149828Z Running 1 items in this shard 2025-09-07T07:34:42.5149831Z 2025-09-07T07:34:42.5150051Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.4374s] [100%] 2025-09-07T07:34:42.5150245Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.2287s] [100%] 2025-09-07T07:34:42.5150413Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True FAILED [0.2249s] [100%] 2025-09-07T07:34:42.5150415Z 2025-09-07T07:34:42.5150462Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.5150570Z _ WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.5150633Z Traceback (most recent call last): 2025-09-07T07:34:42.5150783Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5150817Z self._run_test( 2025-09-07T07:34:42.5150932Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5150987Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5151053Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5151188Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5151234Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5151274Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5151426Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5151475Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5151514Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5152657Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5152701Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5152740Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5152885Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5152968Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5153005Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5153158Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5153226Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5153379Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5153431Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5153471Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5153613Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5153666Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5153706Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5153823Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5153888Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5153932Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5154057Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5154122Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5154163Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5154303Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5155327Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5155385Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5155525Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5155565Z return aot_autograd( 2025-09-07T07:34:42.5155599Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5155737Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5155807Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5155872Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5156033Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5156115Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5156160Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5156359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5156403Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5156662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5156702Z fx_g = _create_graph( 2025-09-07T07:34:42.5156737Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5156902Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5156939Z fx_g = make_fx( 2025-09-07T07:34:42.5156972Z ^^^^^^^^ 2025-09-07T07:34:42.5157124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5157169Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5157207Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5158351Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5158395Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5158431Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5158591Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5158629Z t = dispatch_trace( 2025-09-07T07:34:42.5158694Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5158809Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5158850Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5158885Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5159010Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5159051Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5159086Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5159249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5159328Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5159369Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5159495Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5159536Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5159571Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5159697Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5159738Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5159772Z ^^^^^^^^^ 2025-09-07T07:34:42.5160986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5161030Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5161066Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5161216Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5161266Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5161300Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5161457Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5162238Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5162282Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5162458Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5162498Z outs_pair = fn(*args) 2025-09-07T07:34:42.5162532Z ^^^^^^^^^ 2025-09-07T07:34:42.5162727Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5162795Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5162839Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5163015Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5163056Z outs_pair = fn(*args) 2025-09-07T07:34:42.5163090Z ^^^^^^^^^ 2025-09-07T07:34:42.5163269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5163328Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5163371Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5163570Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5164653Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5164700Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5164876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5164939Z outs_pair = fn(*args) 2025-09-07T07:34:42.5164973Z ^^^^^^^^^ 2025-09-07T07:34:42.5165165Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5165211Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5165248Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5165419Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5165466Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5165502Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5165629Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5165671Z return handle_torch_function( 2025-09-07T07:34:42.5165711Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5165855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5165933Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5165978Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5166166Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5166209Z return func(*args, **kwargs) 2025-09-07T07:34:42.5166245Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5166371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5167539Z result = _engine_run_backward( 2025-09-07T07:34:42.5167577Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5167726Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5167882Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5167932Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5168060Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5168104Z return user_fn(self, *args) 2025-09-07T07:34:42.5168140Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5168312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5168356Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5168393Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5168553Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5168601Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5168637Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5168761Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5168802Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5168837Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5169008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5169061Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5169102Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5169239Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5169289Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5170331Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5170526Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5170576Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5170615Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5170775Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5170815Z t = dispatch_trace( 2025-09-07T07:34:42.5170849Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5170965Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5171008Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5171045Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5171169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5171210Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5171249Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5171412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5171491Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5171533Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5171677Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5171716Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5171752Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5171878Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5171921Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5171954Z ^^^^^^^^^ 2025-09-07T07:34:42.5173110Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5173187Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5173223Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5173265Z File "", line 1, in 2025-09-07T07:34:42.5173412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5173492Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5173540Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5173699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5173749Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5173787Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5173986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5174034Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5174072Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5174250Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5174297Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5174334Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5174487Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5174531Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5174568Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5174707Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5174797Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5175877Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5176012Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5176075Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5176120Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5176252Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5176293Z leaves = list(leaves) 2025-09-07T07:34:42.5176328Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5176456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5176572Z return func(x) 2025-09-07T07:34:42.5176606Z ^^^^^^^ 2025-09-07T07:34:42.5176751Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5176821Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5176865Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5177039Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5177081Z return func(*args, **kwargs) 2025-09-07T07:34:42.5177118Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5177340Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5177428Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5177431Z 2025-09-07T07:34:42.5177644Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5177646Z 2025-09-07T07:34:42.5177650Z 2025-09-07T07:34:42.5177747Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5177950Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.5177952Z 2025-09-07T07:34:42.5178040Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5179155Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5179191Z inline_call [] 2025-09-07T07:34:42.5179277Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5179313Z inductor [] 2025-09-07T07:34:42.5179390Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5179464Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5179734Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5179853Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5179907Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5180062Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5180153Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5180290Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5180415Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5180524Z _ WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.5180569Z Traceback (most recent call last): 2025-09-07T07:34:42.5180743Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5180780Z self._run_test( 2025-09-07T07:34:42.5180897Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5180954Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5180994Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5181134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5182197Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5182239Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5182396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5182444Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5182486Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5182629Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5182673Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5182713Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5182864Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5182967Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5183009Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5183167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5183214Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5183373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5183432Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5183494Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5183651Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5183706Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5183749Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5183878Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5183974Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5184022Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5185246Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5185315Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5185361Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5185515Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5185562Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5185602Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5185756Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5185798Z return aot_autograd( 2025-09-07T07:34:42.5185838Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5185990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5186067Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5186116Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5186294Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5186410Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5186460Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5186838Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5186887Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5187091Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5187135Z fx_g = _create_graph( 2025-09-07T07:34:42.5187173Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5187353Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5188480Z fx_g = make_fx( 2025-09-07T07:34:42.5188518Z ^^^^^^^^ 2025-09-07T07:34:42.5188685Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5188735Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5188778Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5188966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5189015Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5189056Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5189231Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5189272Z t = dispatch_trace( 2025-09-07T07:34:42.5189310Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5189434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5189505Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5189544Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5189683Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5189726Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5189766Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5189945Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5190058Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5190103Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5190241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5190282Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5191391Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5191534Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5191582Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5191619Z ^^^^^^^^^ 2025-09-07T07:34:42.5191765Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5191809Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5191849Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5192014Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5192068Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5192106Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5192280Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5192348Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5192425Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5192620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5192663Z outs_pair = fn(*args) 2025-09-07T07:34:42.5192701Z ^^^^^^^^^ 2025-09-07T07:34:42.5192892Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5192966Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5193013Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5193207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5193249Z outs_pair = fn(*args) 2025-09-07T07:34:42.5194411Z ^^^^^^^^^ 2025-09-07T07:34:42.5194626Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5194698Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5194746Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5194996Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5195077Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5195130Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5195327Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5195371Z outs_pair = fn(*args) 2025-09-07T07:34:42.5195409Z ^^^^^^^^^ 2025-09-07T07:34:42.5195631Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5195703Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5195746Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5195942Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5195996Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5196054Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5196201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5196250Z return handle_torch_function( 2025-09-07T07:34:42.5196291Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5196458Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5196623Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5196680Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5198017Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5198065Z return func(*args, **kwargs) 2025-09-07T07:34:42.5198106Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5198254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5198302Z result = _engine_run_backward( 2025-09-07T07:34:42.5198344Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5198513Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5198655Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5198747Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5198894Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5198943Z return user_fn(self, *args) 2025-09-07T07:34:42.5198985Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5199154Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5199206Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5199248Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5199431Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5199482Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5199524Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5199666Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5199715Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5199755Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5201202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5201263Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5201336Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5201496Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5201554Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5201597Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5201786Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5201844Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5201912Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5202097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5202140Z t = dispatch_trace( 2025-09-07T07:34:42.5202180Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5202312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5202362Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5202426Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5202571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5202615Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5202656Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5202841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5202935Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5202982Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5203126Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5204328Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5204371Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5204526Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5204576Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5204616Z ^^^^^^^^^ 2025-09-07T07:34:42.5204794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5204852Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5204891Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5204964Z File "", line 1, in 2025-09-07T07:34:42.5205137Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5205228Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5205281Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5205444Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5205501Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5205546Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5205773Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5205825Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5205866Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5206068Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5206120Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5206164Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5206332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5207673Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5207718Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5207879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5207983Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5208037Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5208184Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5208282Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5208332Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5208482Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5208527Z leaves = list(leaves) 2025-09-07T07:34:42.5208568Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5208733Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5208776Z return func(x) 2025-09-07T07:34:42.5208814Z ^^^^^^^ 2025-09-07T07:34:42.5208978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5209054Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5209102Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5209302Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5209350Z return func(*args, **kwargs) 2025-09-07T07:34:42.5209392Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5209605Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5210866Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5210870Z 2025-09-07T07:34:42.5211116Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5211120Z 2025-09-07T07:34:42.5211122Z 2025-09-07T07:34:42.5211208Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5211437Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.5211469Z 2025-09-07T07:34:42.5211569Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5211657Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5211698Z inline_call [] 2025-09-07T07:34:42.5211763Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5211804Z inductor [] 2025-09-07T07:34:42.5211891Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5211977Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5212283Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5212417Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5212481Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5212662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5212763Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5212934Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5213076Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5213161Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5213200Z inline_call [] 2025-09-07T07:34:42.5213265Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5213348Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5214589Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5214910Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5215042Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5215101Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5215295Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5215396Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5215549Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5215691Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5215753Z =================================== FAILURES =================================== 2025-09-07T07:34:42.5215878Z _ WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.5215930Z Traceback (most recent call last): 2025-09-07T07:34:42.5216102Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5216144Z self._run_test( 2025-09-07T07:34:42.5216276Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5216340Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5216387Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5216612Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5216665Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5216712Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5216926Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5216980Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5218191Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5218354Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5218406Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5218452Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5218619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5218714Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5218759Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5218937Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5218995Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5219172Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5219235Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5219282Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5219477Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5219540Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5219587Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5219722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5219799Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5219850Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5220023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5220096Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5220145Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5220310Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5221519Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5221586Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5221751Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5221797Z return aot_autograd( 2025-09-07T07:34:42.5221838Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5221999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5222086Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5222138Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5222328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5222425Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5222480Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5222698Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5222748Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5222970Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5223041Z fx_g = _create_graph( 2025-09-07T07:34:42.5223084Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5223276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5223317Z fx_g = make_fx( 2025-09-07T07:34:42.5223355Z ^^^^^^^^ 2025-09-07T07:34:42.5223535Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5223587Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5224774Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5224945Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5224996Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5225036Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5225221Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5225270Z t = dispatch_trace( 2025-09-07T07:34:42.5225310Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5225442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5225491Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5225531Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5225696Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5225745Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5225787Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5225975Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5226066Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5226113Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5226279Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5226324Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5226364Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5226598Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5226647Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5226688Z ^^^^^^^^^ 2025-09-07T07:34:42.5228021Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5228069Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5228110Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5228285Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5228343Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5228387Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5228570Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5228643Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5228693Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5228900Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5228947Z outs_pair = fn(*args) 2025-09-07T07:34:42.5228988Z ^^^^^^^^^ 2025-09-07T07:34:42.5229188Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5229265Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5229316Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5229544Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5229589Z outs_pair = fn(*args) 2025-09-07T07:34:42.5229629Z ^^^^^^^^^ 2025-09-07T07:34:42.5229837Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5229908Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5229957Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5231321Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5231404Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5231457Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5231661Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5231708Z outs_pair = fn(*args) 2025-09-07T07:34:42.5231747Z ^^^^^^^^^ 2025-09-07T07:34:42.5231969Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5232045Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5232088Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5232286Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5232340Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5232382Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5232527Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5232602Z return handle_torch_function( 2025-09-07T07:34:42.5232644Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5232809Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5232895Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5232948Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5233160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5233208Z return func(*args, **kwargs) 2025-09-07T07:34:42.5233249Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5233397Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5234579Z result = _engine_run_backward( 2025-09-07T07:34:42.5234623Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5234799Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5234940Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5234997Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5235147Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5235196Z return user_fn(self, *args) 2025-09-07T07:34:42.5235238Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5235407Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5235458Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5235498Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5235684Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5235766Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5235809Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5235952Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5235999Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5236041Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5236236Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5236295Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5236341Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5236584Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5237786Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5237836Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5238025Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5238081Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5238125Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5238344Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5238390Z t = dispatch_trace( 2025-09-07T07:34:42.5238431Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5238563Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5238614Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5238655Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5238800Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5238870Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5238910Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5239097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5239189Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5239237Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5239410Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5239455Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5239496Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5239643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5239691Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5241016Z ^^^^^^^^^ 2025-09-07T07:34:42.5241197Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5241253Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5241293Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5241341Z File "", line 1, in 2025-09-07T07:34:42.5241510Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5241603Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5241655Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5241814Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5241868Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5241912Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5242165Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5242218Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5242259Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5242459Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5242512Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5242555Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5242722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5242772Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5242813Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5242970Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5243075Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5244270Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5244419Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5244490Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5244580Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5244732Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5244776Z leaves = list(leaves) 2025-09-07T07:34:42.5244817Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5244960Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5245002Z return func(x) 2025-09-07T07:34:42.5245039Z ^^^^^^^ 2025-09-07T07:34:42.5245201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5245296Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5245344Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5245541Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5245589Z return func(*args, **kwargs) 2025-09-07T07:34:42.5245632Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5245859Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5245960Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5245963Z 2025-09-07T07:34:42.5246206Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5246212Z 2025-09-07T07:34:42.5246215Z 2025-09-07T07:34:42.5246300Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5246609Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.5246613Z 2025-09-07T07:34:42.5247862Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5247952Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5247993Z inline_call [] 2025-09-07T07:34:42.5248057Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5248097Z inductor [] 2025-09-07T07:34:42.5248183Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5248267Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5248602Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5248736Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5248796Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5248975Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5249077Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5249232Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5249374Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5249459Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5249501Z inline_call [] 2025-09-07T07:34:42.5249564Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5249647Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5249729Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5250045Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5250176Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5250235Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5251548Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5251647Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5251829Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5251967Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5252049Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5252089Z inline_call [] 2025-09-07T07:34:42.5252152Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5252260Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5252342Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5252637Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5252764Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5252825Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5253001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5253099Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5253250Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5253389Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5253643Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-36aaf2bef84c93cb.xml - 2025-09-07T07:34:42.5253710Z =========================== short test summary info ============================ 2025-09-07T07:34:42.5254130Z FAILED [0.2249s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5254249Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5254252Z 2025-09-07T07:34:42.5254495Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5254497Z 2025-09-07T07:34:42.5254500Z 2025-09-07T07:34:42.5255725Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5255950Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.5255953Z 2025-09-07T07:34:42.5256052Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5256126Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.5256204Z ================== 1 failed, 245 deselected, 2 rerun in 1.10s ================== 2025-09-07T07:34:42.5256244Z Got exit code 1 2025-09-07T07:34:42.5256390Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-09-07T07:34:42.5257009Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.5257056Z import pkg_resources 2025-09-07T07:34:42.5257255Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-1db72ca20937eb31.xml 2025-09-07T07:34:42.5257320Z ============================= test session starts ============================== 2025-09-07T07:34:42.5257478Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.5257524Z cachedir: .pytest_cache 2025-09-07T07:34:42.5257707Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.5257759Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.5257807Z configfile: pytest.ini 2025-09-07T07:34:42.5258016Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.5258105Z collecting ... collected 467 items / 65 deselected / 402 selected 2025-09-07T07:34:42.5258164Z stepcurrent: skipping 65 already run items. 2025-09-07T07:34:42.5258213Z Running 181 items in this shard 2025-09-07T07:34:42.5258216Z 2025-09-07T07:34:42.5258445Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.8674s] [ 0%] 2025-09-07T07:34:42.5259827Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.7084s] [ 0%] 2025-09-07T07:34:42.5260022Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True FAILED [0.7481s] [ 0%] 2025-09-07T07:34:42.5260028Z 2025-09-07T07:34:42.5260086Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.5260211Z _ WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.5260261Z Traceback (most recent call last): 2025-09-07T07:34:42.5260437Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5260479Z self._run_test( 2025-09-07T07:34:42.5260641Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5260707Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5260755Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5260914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5260968Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5261016Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5261195Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5261250Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5261295Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5261455Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5261507Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5261552Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5261718Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5261813Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5261857Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5263195Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5263251Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5263428Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5263491Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5263538Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5263705Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5263797Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5263843Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5263980Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5264057Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5264109Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5264276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5264351Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5264400Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5264564Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5264620Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5264664Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5264827Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5264872Z return aot_autograd( 2025-09-07T07:34:42.5264914Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5265075Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5266302Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5266357Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5266624Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5266722Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5266805Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5267022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5267073Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5267292Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5267340Z fx_g = _create_graph( 2025-09-07T07:34:42.5267382Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5267576Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5267616Z fx_g = make_fx( 2025-09-07T07:34:42.5267656Z ^^^^^^^^ 2025-09-07T07:34:42.5267837Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5267894Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5267939Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5268113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5268162Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5268204Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5268412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5268459Z t = dispatch_trace( 2025-09-07T07:34:42.5268499Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5269784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5269836Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5269878Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5270026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5270106Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5270149Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5270338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5270433Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5270481Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5270650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5270696Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5270737Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5270885Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5270935Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5270975Z ^^^^^^^^^ 2025-09-07T07:34:42.5271134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5271181Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5271223Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5271398Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5271459Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5271499Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5272823Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5272896Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5272948Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5273154Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5273227Z outs_pair = fn(*args) 2025-09-07T07:34:42.5273267Z ^^^^^^^^^ 2025-09-07T07:34:42.5273470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5273548Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5273601Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5273807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5273852Z outs_pair = fn(*args) 2025-09-07T07:34:42.5273892Z ^^^^^^^^^ 2025-09-07T07:34:42.5274100Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5274171Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5274224Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5274454Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5274535Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5274606Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5274813Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5274859Z outs_pair = fn(*args) 2025-09-07T07:34:42.5274899Z ^^^^^^^^^ 2025-09-07T07:34:42.5275123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5276309Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5276376Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5276647Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5276701Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5276744Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5276893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5276943Z return handle_torch_function( 2025-09-07T07:34:42.5277010Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5277177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5277265Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5277317Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5277518Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5277569Z return func(*args, **kwargs) 2025-09-07T07:34:42.5277611Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5277756Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5277805Z result = _engine_run_backward( 2025-09-07T07:34:42.5277849Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5278023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5278166Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5278222Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5278373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5278449Z return user_fn(self, *args) 2025-09-07T07:34:42.5279668Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5279842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5279894Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5279936Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5280125Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5280220Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5280263Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5280409Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5280456Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5280497Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5280694Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5280755Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5280802Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5280963Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5281048Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5281095Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5281286Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5281342Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5281388Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5281575Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5281647Z t = dispatch_trace( 2025-09-07T07:34:42.5282834Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5282969Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5283019Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5283061Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5283210Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5283274Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5283316Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5283505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5283597Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5283644Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5283796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5283840Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5283881Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5284028Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5284077Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5284119Z ^^^^^^^^^ 2025-09-07T07:34:42.5284298Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5284354Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5284395Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5284443Z File "", line 1, in 2025-09-07T07:34:42.5284611Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5284722Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5285924Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5286086Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5286142Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5286186Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5286415Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5286465Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5286579Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5286783Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5286834Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5286883Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5287051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5287102Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5287143Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5287338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5287443Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5287498Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5287645Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5287715Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5287766Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5287945Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5287990Z leaves = list(leaves) 2025-09-07T07:34:42.5289188Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5289334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5289376Z return func(x) 2025-09-07T07:34:42.5289416Z ^^^^^^^ 2025-09-07T07:34:42.5289621Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5289697Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5289745Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5289941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5289990Z return func(*args, **kwargs) 2025-09-07T07:34:42.5290036Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5290252Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5290353Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5290356Z 2025-09-07T07:34:42.5290603Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5290607Z 2025-09-07T07:34:42.5290609Z 2025-09-07T07:34:42.5290694Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5290922Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.5290925Z 2025-09-07T07:34:42.5291025Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5291137Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5291178Z inline_call [] 2025-09-07T07:34:42.5291243Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5291282Z inductor [] 2025-09-07T07:34:42.5291369Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5291454Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5292907Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5293041Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5293101Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5293279Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5293386Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5293539Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5293680Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5293822Z _ WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.5293877Z Traceback (most recent call last): 2025-09-07T07:34:42.5294049Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5294091Z self._run_test( 2025-09-07T07:34:42.5294223Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5294286Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5294337Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5294510Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5294565Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5294610Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5294788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5294841Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5294905Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5295064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5296253Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5296298Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5296466Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5296645Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5296691Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5296869Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5296923Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5297101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5297163Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5297209Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5297375Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5297434Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5297509Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5297648Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5297724Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5297775Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5297926Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5297999Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5298049Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5298212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5298263Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5298307Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5299616Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5299667Z return aot_autograd( 2025-09-07T07:34:42.5299707Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5299868Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5299948Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5300027Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5300218Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5300315Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5300367Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5300580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5300651Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5300870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5300916Z fx_g = _create_graph( 2025-09-07T07:34:42.5300957Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5301148Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5301216Z fx_g = make_fx( 2025-09-07T07:34:42.5301254Z ^^^^^^^^ 2025-09-07T07:34:42.5301434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5301487Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5301531Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5301702Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5302886Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5302930Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5303115Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5303159Z t = dispatch_trace( 2025-09-07T07:34:42.5303201Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5303335Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5303384Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5303425Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5303571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5303616Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5303657Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5303870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5303964Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5304013Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5304159Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5304205Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5304246Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5304395Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5304443Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5304485Z ^^^^^^^^^ 2025-09-07T07:34:42.5304638Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5304688Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5305864Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5306044Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5306102Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5306142Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5306348Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5306423Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5306474Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5306782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5306828Z outs_pair = fn(*args) 2025-09-07T07:34:42.5306869Z ^^^^^^^^^ 2025-09-07T07:34:42.5307073Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5307181Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5307232Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5307437Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5307482Z outs_pair = fn(*args) 2025-09-07T07:34:42.5307545Z ^^^^^^^^^ 2025-09-07T07:34:42.5307754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5307824Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5307874Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5308101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5308186Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5308239Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5309602Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5309647Z outs_pair = fn(*args) 2025-09-07T07:34:42.5309689Z ^^^^^^^^^ 2025-09-07T07:34:42.5309913Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5309967Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5310009Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5310207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5310298Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5310342Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5310489Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5310540Z return handle_torch_function( 2025-09-07T07:34:42.5310582Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5310750Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5310838Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5310892Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5311089Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5311138Z return func(*args, **kwargs) 2025-09-07T07:34:42.5311183Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5311330Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5311379Z result = _engine_run_backward( 2025-09-07T07:34:42.5311421Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5312762Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5312908Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5312965Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5313115Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5313163Z return user_fn(self, *args) 2025-09-07T07:34:42.5313206Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5313379Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5313451Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5313494Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5313682Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5313735Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5313778Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5313941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5313987Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5314029Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5314223Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5314286Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5314333Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5314493Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5314551Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5314596Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5314787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5315984Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5316030Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5316218Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5316262Z t = dispatch_trace( 2025-09-07T07:34:42.5316302Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5316458Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5316581Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5316623Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5316771Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5316816Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5316858Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5317052Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5317142Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5317190Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5317334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5317380Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5317425Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5317573Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5317620Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5317661Z ^^^^^^^^^ 2025-09-07T07:34:42.5317837Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5319078Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5319120Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5319170Z File "", line 1, in 2025-09-07T07:34:42.5319338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5319428Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5319480Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5319666Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5319721Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5319766Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5319990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5320042Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5320105Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5320418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5320471Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5320514Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5320683Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5320739Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5320780Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5320938Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5321042Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5321097Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5321246Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5322468Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5322519Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5322668Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5322747Z leaves = list(leaves) 2025-09-07T07:34:42.5322790Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5322936Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5322976Z return func(x) 2025-09-07T07:34:42.5323016Z ^^^^^^^ 2025-09-07T07:34:42.5323176Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5323252Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5323301Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5323497Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5323544Z return func(*args, **kwargs) 2025-09-07T07:34:42.5323586Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5323797Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5323901Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5323903Z 2025-09-07T07:34:42.5324149Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5324151Z 2025-09-07T07:34:42.5324153Z 2025-09-07T07:34:42.5324263Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5324491Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.5324494Z 2025-09-07T07:34:42.5324594Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5324680Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5325862Z inline_call [] 2025-09-07T07:34:42.5325929Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5325991Z inductor [] 2025-09-07T07:34:42.5326079Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5326163Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5326467Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5326701Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5326762Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5326940Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5327040Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5327196Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5327337Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5327422Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5327462Z inline_call [] 2025-09-07T07:34:42.5327526Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5327612Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5327697Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5327999Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5328129Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5328214Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5328391Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5328490Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5329794Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5329937Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5329997Z =================================== FAILURES =================================== 2025-09-07T07:34:42.5330121Z _ WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.5330172Z Traceback (most recent call last): 2025-09-07T07:34:42.5330343Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5330390Z self._run_test( 2025-09-07T07:34:42.5330521Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5330585Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5330632Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5330809Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5330865Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5330914Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5331092Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5331146Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5331192Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5331355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5331437Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5331480Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5331648Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5331742Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5332932Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5333138Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5333194Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5333371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5333433Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5333480Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5333653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5333712Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5333758Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5333893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5333972Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5334024Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5334174Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5334248Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5334296Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5334460Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5334533Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5334577Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5334738Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5334785Z return aot_autograd( 2025-09-07T07:34:42.5334826Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5334988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5336207Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5336263Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5336451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5336630Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5336684Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5336900Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5336950Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5337197Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5337243Z fx_g = _create_graph( 2025-09-07T07:34:42.5337285Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5337476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5337518Z fx_g = make_fx( 2025-09-07T07:34:42.5337556Z ^^^^^^^^ 2025-09-07T07:34:42.5337734Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5337832Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5337878Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5338049Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5338100Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5338144Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5338351Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5338395Z t = dispatch_trace( 2025-09-07T07:34:42.5339590Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5339724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5339775Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5339818Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5339970Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5340016Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5340058Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5340248Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5340341Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5340391Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5340536Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5340582Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5340622Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5340769Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5340843Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5340885Z ^^^^^^^^^ 2025-09-07T07:34:42.5341041Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5341088Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5341130Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5341307Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5341365Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5341406Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5342725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5342799Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5342851Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5343061Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5343107Z outs_pair = fn(*args) 2025-09-07T07:34:42.5343148Z ^^^^^^^^^ 2025-09-07T07:34:42.5343352Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5343448Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5343502Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5343707Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5343752Z outs_pair = fn(*args) 2025-09-07T07:34:42.5343791Z ^^^^^^^^^ 2025-09-07T07:34:42.5344000Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5344089Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5344138Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5344367Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5344450Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5344518Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5344722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5344767Z outs_pair = fn(*args) 2025-09-07T07:34:42.5344809Z ^^^^^^^^^ 2025-09-07T07:34:42.5345030Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5346224Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5346268Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5346468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5346621Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5346668Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5346817Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5346867Z return handle_torch_function( 2025-09-07T07:34:42.5346908Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5347075Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5347163Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5347251Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5347450Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5347499Z return func(*args, **kwargs) 2025-09-07T07:34:42.5347541Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5347687Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5347736Z result = _engine_run_backward( 2025-09-07T07:34:42.5347779Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5347952Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5348093Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5348150Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5348300Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5349494Z return user_fn(self, *args) 2025-09-07T07:34:42.5349539Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5349708Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5349790Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5349833Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5350019Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5350070Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5350112Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5350258Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5350307Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5350378Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5350575Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5350635Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5350681Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5350842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5350923Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5350970Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5351159Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5351215Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5351260Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5351451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5351495Z t = dispatch_trace( 2025-09-07T07:34:42.5352676Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5352812Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5352862Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5352906Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5353051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5353097Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5353138Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5353327Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5353417Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5353490Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5353634Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5353679Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5353718Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5353867Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5353915Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5353957Z ^^^^^^^^^ 2025-09-07T07:34:42.5354136Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5354193Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5354231Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5354280Z File "", line 1, in 2025-09-07T07:34:42.5354446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5355676Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5355730Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5355889Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5355963Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5356009Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5356236Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5356288Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5356329Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5356589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5356680Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5356724Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5356891Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5356941Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5356982Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5357163Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5357268Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5357321Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5357468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5357537Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5357591Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5357743Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5357788Z leaves = list(leaves) 2025-09-07T07:34:42.5358977Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5359127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5359167Z return func(x) 2025-09-07T07:34:42.5359206Z ^^^^^^^ 2025-09-07T07:34:42.5359368Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5359445Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5359492Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5359688Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5359764Z return func(*args, **kwargs) 2025-09-07T07:34:42.5359807Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5360017Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5360116Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5360119Z 2025-09-07T07:34:42.5360445Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5360448Z 2025-09-07T07:34:42.5360450Z 2025-09-07T07:34:42.5360536Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5360762Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.5360767Z 2025-09-07T07:34:42.5360869Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5360955Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5360995Z inline_call [] 2025-09-07T07:34:42.5361058Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5361098Z inductor [] 2025-09-07T07:34:42.5361206Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5362443Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5362751Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5362883Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5362943Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5363144Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5363245Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5363399Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5363539Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5363638Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5363678Z inline_call [] 2025-09-07T07:34:42.5363741Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5363823Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5363906Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5364205Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5364334Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5364394Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5364572Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5364672Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5364822Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5364960Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5365042Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5365100Z inline_call [] 2025-09-07T07:34:42.5366296Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5366379Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5366460Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5366866Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5366995Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5367053Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5367227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5367324Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5367475Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5367615Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5367866Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-1db72ca20937eb31.xml - 2025-09-07T07:34:42.5367962Z =========================== short test summary info ============================ 2025-09-07T07:34:42.5368380Z FAILED [0.7481s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5368477Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5368480Z 2025-09-07T07:34:42.5368719Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5368744Z 2025-09-07T07:34:42.5368746Z 2025-09-07T07:34:42.5368830Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5369059Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.5369061Z 2025-09-07T07:34:42.5369180Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5369251Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.5369325Z ================== 1 failed, 65 deselected, 2 rerun in 2.59s =================== 2025-09-07T07:34:42.5369367Z Got exit code 1 2025-09-07T07:34:42.5370566Z Retrying single test... 2025-09-07T07:34:42.5371061Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.5371109Z import pkg_resources 2025-09-07T07:34:42.5371310Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-8a0dfbc5ee438824.xml 2025-09-07T07:34:42.5371376Z ============================= test session starts ============================== 2025-09-07T07:34:42.5371507Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.5371551Z cachedir: .pytest_cache 2025-09-07T07:34:42.5371732Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.5371783Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.5371853Z configfile: pytest.ini 2025-09-07T07:34:42.5372044Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.5372134Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.5372399Z stepcurrent: skipping 65 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.5372448Z Running 1 items in this shard 2025-09-07T07:34:42.5372452Z 2025-09-07T07:34:42.5372679Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.9363s] [100%] 2025-09-07T07:34:42.5372904Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.6965s] [100%] 2025-09-07T07:34:42.5373097Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True FAILED [0.8191s] [100%] 2025-09-07T07:34:42.5373101Z 2025-09-07T07:34:42.5373159Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.5373281Z _ WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.5373333Z Traceback (most recent call last): 2025-09-07T07:34:42.5373525Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5374703Z self._run_test( 2025-09-07T07:34:42.5374837Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5374903Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5374949Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5375107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5375185Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5375231Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5375407Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5375461Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5375506Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5375683Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5375734Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5375777Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5375945Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5376038Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5376086Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5376263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5376319Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5376586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5376650Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5376698Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5376864Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5378074Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5378121Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5378256Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5378368Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5378418Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5378566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5378638Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5378688Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5378851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5378904Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5378947Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5379109Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5379154Z return aot_autograd( 2025-09-07T07:34:42.5379198Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5379357Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5379438Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5379491Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5379706Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5379804Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5379857Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5380068Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5381251Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5381472Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5381545Z fx_g = _create_graph( 2025-09-07T07:34:42.5381585Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5381778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5381818Z fx_g = make_fx( 2025-09-07T07:34:42.5381856Z ^^^^^^^^ 2025-09-07T07:34:42.5382056Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5382111Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5382156Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5382326Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5382376Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5382421Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5382606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5382649Z t = dispatch_trace( 2025-09-07T07:34:42.5382690Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5382821Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5382872Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5382914Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5383060Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5383107Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5383148Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5384477Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5384594Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5384646Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5384793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5384837Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5384878Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5385026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5385075Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5385116Z ^^^^^^^^^ 2025-09-07T07:34:42.5385271Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5385319Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5385359Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5385534Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5385596Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5385636Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5385819Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5385892Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5385958Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5386166Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5386212Z outs_pair = fn(*args) 2025-09-07T07:34:42.5386254Z ^^^^^^^^^ 2025-09-07T07:34:42.5386454Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5387803Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5387882Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5388091Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5388135Z outs_pair = fn(*args) 2025-09-07T07:34:42.5388176Z ^^^^^^^^^ 2025-09-07T07:34:42.5388408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5388480Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5388529Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5388760Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5388843Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5388899Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5389101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5389147Z outs_pair = fn(*args) 2025-09-07T07:34:42.5389186Z ^^^^^^^^^ 2025-09-07T07:34:42.5389411Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5389465Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5389508Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5389706Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5389759Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5389827Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5389976Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5391169Z return handle_torch_function( 2025-09-07T07:34:42.5391214Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5391381Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5391469Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5391523Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5391721Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5391770Z return func(*args, **kwargs) 2025-09-07T07:34:42.5391811Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5391957Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5392011Z result = _engine_run_backward( 2025-09-07T07:34:42.5392052Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5392224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5392367Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5392450Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5392601Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5392649Z return user_fn(self, *args) 2025-09-07T07:34:42.5392693Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5392862Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5392916Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5392977Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5393163Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5393215Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5394400Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5394549Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5394614Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5394657Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5394852Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5394913Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5394958Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5395122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5395183Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5395228Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5395420Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5395478Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5395524Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5395713Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5395758Z t = dispatch_trace( 2025-09-07T07:34:42.5395799Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5395933Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5395983Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5396047Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5396194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5396239Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5397514Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5397709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5397803Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5397853Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5398002Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5398047Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5398087Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5398236Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5398290Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5398330Z ^^^^^^^^^ 2025-09-07T07:34:42.5398506Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5398566Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5398605Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5398682Z File "", line 1, in 2025-09-07T07:34:42.5398856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5398949Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5399002Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5399163Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5399221Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5399291Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5399519Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5399570Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5400873Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5401121Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5401174Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5401219Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5401389Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5401439Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5401480Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5401644Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5401748Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5401802Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5401951Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5402023Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5402073Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5402223Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5402268Z leaves = list(leaves) 2025-09-07T07:34:42.5402310Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5402457Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5402533Z return func(x) 2025-09-07T07:34:42.5402573Z ^^^^^^^ 2025-09-07T07:34:42.5402735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5402812Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5402860Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5404226Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5404274Z return func(*args, **kwargs) 2025-09-07T07:34:42.5404317Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5404534Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5404637Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5404644Z 2025-09-07T07:34:42.5404893Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5404897Z 2025-09-07T07:34:42.5404900Z 2025-09-07T07:34:42.5404987Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5405239Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.5405242Z 2025-09-07T07:34:42.5405344Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5405432Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5405474Z inline_call [] 2025-09-07T07:34:42.5405537Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5405579Z inductor [] 2025-09-07T07:34:42.5405666Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5405771Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5406078Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5406213Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5406273Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5406473Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5406640Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5406798Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5408118Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5408248Z _ WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.5408299Z Traceback (most recent call last): 2025-09-07T07:34:42.5408475Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5408517Z self._run_test( 2025-09-07T07:34:42.5408653Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5408719Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5408768Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5408924Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5408979Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5409025Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5409236Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5409291Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5409338Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5409499Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5409553Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5409597Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5409768Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5409864Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5409911Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5410095Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5411308Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5411489Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5411553Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5411600Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5411796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5411859Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5411905Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5412043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5412120Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5412173Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5412348Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5412423Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5412471Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5412637Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5412690Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5412754Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5412918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5412965Z return aot_autograd( 2025-09-07T07:34:42.5413007Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5413168Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5413251Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5413308Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5414658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5414760Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5414816Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5415037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5415087Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5415309Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5415355Z fx_g = _create_graph( 2025-09-07T07:34:42.5415424Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5415619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5415661Z fx_g = make_fx( 2025-09-07T07:34:42.5415699Z ^^^^^^^^ 2025-09-07T07:34:42.5415879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5415934Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5415980Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5416154Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5416204Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5416247Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5416436Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5416557Z t = dispatch_trace( 2025-09-07T07:34:42.5416597Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5416734Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5416782Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5417991Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5418167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5418217Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5418258Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5418451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5418544Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5418594Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5418744Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5418814Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5418855Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5419005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5419054Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5419097Z ^^^^^^^^^ 2025-09-07T07:34:42.5419273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5419322Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5419363Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5419540Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5419600Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5419641Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5419830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5419903Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5421112Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5421325Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5421372Z outs_pair = fn(*args) 2025-09-07T07:34:42.5421414Z ^^^^^^^^^ 2025-09-07T07:34:42.5421619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5421697Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5421751Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5421985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5422034Z outs_pair = fn(*args) 2025-09-07T07:34:42.5422074Z ^^^^^^^^^ 2025-09-07T07:34:42.5422289Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5422360Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5422412Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5422644Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5422726Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5422779Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5422986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5423033Z outs_pair = fn(*args) 2025-09-07T07:34:42.5423074Z ^^^^^^^^^ 2025-09-07T07:34:42.5423300Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5423372Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5423416Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5424772Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5424828Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5424873Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5425021Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5425075Z return handle_torch_function( 2025-09-07T07:34:42.5425138Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5425307Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5425396Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5425451Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5425677Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5425727Z return func(*args, **kwargs) 2025-09-07T07:34:42.5425769Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5425916Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5425966Z result = _engine_run_backward( 2025-09-07T07:34:42.5426007Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5426185Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5426329Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5426388Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5426610Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5426662Z return user_fn(self, *args) 2025-09-07T07:34:42.5426705Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5428052Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5428105Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5428149Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5428337Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5428423Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5428466Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5428614Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5428661Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5428703Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5428901Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5428963Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5429010Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5429174Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5429232Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5429282Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5429475Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5429533Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5429579Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5429788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5429836Z t = dispatch_trace( 2025-09-07T07:34:42.5429877Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5430012Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5431220Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5431265Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5431412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5431489Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5431530Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5431722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5431815Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5431864Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5432036Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5432083Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5432123Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5432274Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5432321Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5432363Z ^^^^^^^^^ 2025-09-07T07:34:42.5432542Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5432604Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5432644Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5432694Z File "", line 1, in 2025-09-07T07:34:42.5432866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5432961Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5433016Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5434332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5434390Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5434436Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5434663Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5434740Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5434782Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5434985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5435039Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5435082Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5435253Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5435303Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5435346Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5435504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5435611Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5435665Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5435814Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5435886Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5435956Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5436109Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5436155Z leaves = list(leaves) 2025-09-07T07:34:42.5436195Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5436343Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5437631Z return func(x) 2025-09-07T07:34:42.5437673Z ^^^^^^^ 2025-09-07T07:34:42.5437841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5437952Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5438001Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5438199Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5438248Z return func(*args, **kwargs) 2025-09-07T07:34:42.5438291Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5438528Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5438630Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5438633Z 2025-09-07T07:34:42.5438878Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5438885Z 2025-09-07T07:34:42.5438887Z 2025-09-07T07:34:42.5438974Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5439204Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.5439206Z 2025-09-07T07:34:42.5439311Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5439399Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5439441Z inline_call [] 2025-09-07T07:34:42.5439507Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5439548Z inductor [] 2025-09-07T07:34:42.5439635Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5439720Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5440056Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5441415Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5441479Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5441663Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5441765Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5441923Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5442065Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5442150Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5442196Z inline_call [] 2025-09-07T07:34:42.5442259Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5442345Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5442428Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5442756Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5442889Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5442950Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5443128Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5443229Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5443383Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5443547Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5443606Z =================================== FAILURES =================================== 2025-09-07T07:34:42.5443732Z _ WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.5443783Z Traceback (most recent call last): 2025-09-07T07:34:42.5443977Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5445177Z self._run_test( 2025-09-07T07:34:42.5445314Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5445378Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5445427Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5445587Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5445641Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5445688Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5445867Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5445923Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5445970Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5446133Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5446186Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5446229Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5446399Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5446643Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5446690Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5446870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5446925Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5447103Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5447166Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5447215Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5448556Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5448618Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5448664Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5448805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5448885Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5448938Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5449089Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5449194Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5449245Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5449413Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5449464Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5449508Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5449671Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5449754Z return aot_autograd( 2025-09-07T07:34:42.5449795Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5449957Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5450038Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5450093Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5450308Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5450407Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5450460Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5450677Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5451893Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5452116Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5452163Z fx_g = _create_graph( 2025-09-07T07:34:42.5452206Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5452401Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5452443Z fx_g = make_fx( 2025-09-07T07:34:42.5452483Z ^^^^^^^^ 2025-09-07T07:34:42.5452670Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5452724Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5452769Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5452944Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5453023Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5453067Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5453256Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5453301Z t = dispatch_trace( 2025-09-07T07:34:42.5453341Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5453479Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5453530Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5453573Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5453721Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5453768Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5453810Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5455161Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5455259Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5455307Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5455456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5455503Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5455563Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5455717Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5455766Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5455807Z ^^^^^^^^^ 2025-09-07T07:34:42.5455964Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5456012Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5456055Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5456261Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5456319Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5456359Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5456635Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5456709Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5456791Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5457000Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5457049Z outs_pair = fn(*args) 2025-09-07T07:34:42.5457090Z ^^^^^^^^^ 2025-09-07T07:34:42.5458466Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5458552Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5458605Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5458810Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5458858Z outs_pair = fn(*args) 2025-09-07T07:34:42.5458899Z ^^^^^^^^^ 2025-09-07T07:34:42.5459113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5459183Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5459234Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5459465Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5459581Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5459634Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5459841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5459887Z outs_pair = fn(*args) 2025-09-07T07:34:42.5459929Z ^^^^^^^^^ 2025-09-07T07:34:42.5460155Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5460210Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5460253Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5460455Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5460511Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5460555Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5460703Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5461918Z return handle_torch_function( 2025-09-07T07:34:42.5461962Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5462159Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5462251Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5462304Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5462503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5462551Z return func(*args, **kwargs) 2025-09-07T07:34:42.5462593Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5462764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5462814Z result = _engine_run_backward( 2025-09-07T07:34:42.5462856Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5463030Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5463175Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5463250Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5463402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5463452Z return user_fn(self, *args) 2025-09-07T07:34:42.5463496Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5463669Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5463722Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5463766Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5463953Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5465163Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5465209Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5465358Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5465406Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5465449Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5465646Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5465708Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5465785Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5465951Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5466008Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5466054Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5466248Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5466306Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5466352Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5466656Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5466702Z t = dispatch_trace( 2025-09-07T07:34:42.5466742Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5466878Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5466931Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5466975Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5467123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5467169Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5468382Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5468606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5468699Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5468749Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5468897Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5468943Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5468984Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5469166Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5469215Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5469256Z ^^^^^^^^^ 2025-09-07T07:34:42.5469433Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5469492Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5469532Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5469603Z File "", line 1, in 2025-09-07T07:34:42.5469774Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5469869Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5469921Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5470082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5470141Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5470185Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5470414Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5471632Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5471676Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5471884Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5471938Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5471981Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5472153Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5472233Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5472278Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5472438Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5472543Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5472596Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5472747Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5472819Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5472870Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5473019Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5473065Z leaves = list(leaves) 2025-09-07T07:34:42.5473107Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5473256Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5473297Z return func(x) 2025-09-07T07:34:42.5473336Z ^^^^^^^ 2025-09-07T07:34:42.5473499Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5473576Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5474810Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5475013Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5475062Z return func(*args, **kwargs) 2025-09-07T07:34:42.5475104Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5475320Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5475424Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5475444Z 2025-09-07T07:34:42.5475693Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5475696Z 2025-09-07T07:34:42.5475698Z 2025-09-07T07:34:42.5475784Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5476030Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.5476033Z 2025-09-07T07:34:42.5476136Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5476223Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5476265Z inline_call [] 2025-09-07T07:34:42.5476330Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5476373Z inductor [] 2025-09-07T07:34:42.5476461Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5476614Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5476923Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5477061Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5477121Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5477301Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5477402Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5478727Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5478907Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5478992Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5479032Z inline_call [] 2025-09-07T07:34:42.5479096Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5479182Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5479266Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5479570Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5479701Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5479761Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5479945Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5480046Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5480244Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5480408Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5480492Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5480534Z inline_call [] 2025-09-07T07:34:42.5480598Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5480682Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5480764Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5481068Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5481219Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5481278Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5482627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5482755Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5482908Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5483049Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5483307Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-8a0dfbc5ee438824.xml - 2025-09-07T07:34:42.5483381Z =========================== short test summary info ============================ 2025-09-07T07:34:42.5483815Z FAILED [0.8191s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5483915Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5483919Z 2025-09-07T07:34:42.5484165Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5484169Z 2025-09-07T07:34:42.5484170Z 2025-09-07T07:34:42.5484254Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5484505Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.5484509Z 2025-09-07T07:34:42.5484610Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5484681Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.5484759Z ================== 1 failed, 245 deselected, 2 rerun in 2.74s ================== 2025-09-07T07:34:42.5484802Z Got exit code 1 2025-09-07T07:34:42.5484848Z Retrying single test... 2025-09-07T07:34:42.5485353Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.5485398Z import pkg_resources 2025-09-07T07:34:42.5485605Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-36d5cfe937bbecc8.xml 2025-09-07T07:34:42.5485672Z ============================= test session starts ============================== 2025-09-07T07:34:42.5485807Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.5487099Z cachedir: .pytest_cache 2025-09-07T07:34:42.5487319Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.5487374Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.5487420Z configfile: pytest.ini 2025-09-07T07:34:42.5487612Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.5487704Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.5487973Z stepcurrent: skipping 65 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.5488049Z Running 1 items in this shard 2025-09-07T07:34:42.5488053Z 2025-09-07T07:34:42.5488285Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [1.1008s] [100%] 2025-09-07T07:34:42.5488540Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.7665s] [100%] 2025-09-07T07:34:42.5488737Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True FAILED [0.7031s] [100%] 2025-09-07T07:34:42.5488740Z 2025-09-07T07:34:42.5488798Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.5488924Z _ WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.5488978Z Traceback (most recent call last): 2025-09-07T07:34:42.5489158Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5489201Z self._run_test( 2025-09-07T07:34:42.5489336Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5489403Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5489453Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5489615Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5489671Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5489717Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5491077Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5491161Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5491212Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5491375Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5491428Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5491472Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5491646Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5491743Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5491790Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5491972Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5492029Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5492207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5492276Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5492324Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5492492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5492553Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5492618Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5492758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5492838Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5492890Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5493041Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5494279Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5494351Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5494523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5494577Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5494621Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5494825Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5494873Z return aot_autograd( 2025-09-07T07:34:42.5494914Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5495077Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5495158Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5495213Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5495409Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5495509Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5495561Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5495780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5495832Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5496054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5496101Z fx_g = _create_graph( 2025-09-07T07:34:42.5496143Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5496336Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5496410Z fx_g = make_fx( 2025-09-07T07:34:42.5496449Z ^^^^^^^^ 2025-09-07T07:34:42.5497949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5498005Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5498051Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5498228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5498280Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5498323Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5498513Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5498557Z t = dispatch_trace( 2025-09-07T07:34:42.5498600Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5498737Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5498790Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5498832Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5498981Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5499029Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5499104Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5499299Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5499393Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5499442Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5499590Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5499636Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5499703Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5501023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5501074Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5501116Z ^^^^^^^^^ 2025-09-07T07:34:42.5501276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5501326Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5501393Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5501573Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5501631Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5501672Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5501861Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5501940Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5501992Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5502203Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5502249Z outs_pair = fn(*args) 2025-09-07T07:34:42.5502291Z ^^^^^^^^^ 2025-09-07T07:34:42.5502500Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5502579Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5502631Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5502839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5502911Z outs_pair = fn(*args) 2025-09-07T07:34:42.5502954Z ^^^^^^^^^ 2025-09-07T07:34:42.5503166Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5504507Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5504558Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5504794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5504876Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5504930Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5505136Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5505184Z outs_pair = fn(*args) 2025-09-07T07:34:42.5505226Z ^^^^^^^^^ 2025-09-07T07:34:42.5505451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5505505Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5505548Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5505771Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5505829Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5505875Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5506025Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5506077Z return handle_torch_function( 2025-09-07T07:34:42.5506119Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5506288Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5506396Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5506450Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5506741Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5507967Z return func(*args, **kwargs) 2025-09-07T07:34:42.5508011Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5508190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5508240Z result = _engine_run_backward( 2025-09-07T07:34:42.5508284Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5508458Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5508604Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5508665Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5508816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5508865Z return user_fn(self, *args) 2025-09-07T07:34:42.5508910Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5509084Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5509136Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5509179Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5509369Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5509424Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5509493Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5509643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5509689Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5509731Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5509929Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5509991Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5511202Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5511368Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5511426Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5511473Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5511666Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5511728Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5511773Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5511964Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5512009Z t = dispatch_trace( 2025-09-07T07:34:42.5512050Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5512215Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5512267Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5512310Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5512459Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5512505Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5512545Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5512742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5512858Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5512907Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5513055Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5513102Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5513142Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5514479Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5514530Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5514571Z ^^^^^^^^^ 2025-09-07T07:34:42.5514749Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5514809Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5514851Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5514901Z File "", line 1, in 2025-09-07T07:34:42.5515073Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5515165Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5515221Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5515386Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5515442Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5515488Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5515718Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5515794Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5515838Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5516045Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5516097Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5516141Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5516315Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5516367Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5517672Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5517838Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5517943Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5517998Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5518152Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5518224Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5518275Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5518425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5518503Z leaves = list(leaves) 2025-09-07T07:34:42.5518545Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5518694Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5518736Z return func(x) 2025-09-07T07:34:42.5518776Z ^^^^^^^ 2025-09-07T07:34:42.5518941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5519019Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5519092Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5519294Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5519342Z return func(*args, **kwargs) 2025-09-07T07:34:42.5519386Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5519605Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5519736Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5519740Z 2025-09-07T07:34:42.5519986Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5519989Z 2025-09-07T07:34:42.5519991Z 2025-09-07T07:34:42.5521346Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5521582Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.5521585Z 2025-09-07T07:34:42.5521688Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5521777Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5521822Z inline_call [] 2025-09-07T07:34:42.5521886Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5521930Z inductor [] 2025-09-07T07:34:42.5522018Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5522104Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5522416Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5522586Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5522646Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5522831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5522934Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5523094Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5523237Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5523362Z _ WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.5523414Z Traceback (most recent call last): 2025-09-07T07:34:42.5523590Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5523634Z self._run_test( 2025-09-07T07:34:42.5523771Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5525003Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5525053Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5525236Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5525295Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5525341Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5525523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5525578Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5525625Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5525785Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5525856Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5525901Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5526069Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5526166Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5526213Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5526411Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5526465Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5526720Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5526783Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5526833Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5527004Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5527065Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5527110Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5528433Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5528514Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5528567Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5528717Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5528793Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5528841Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5529043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5529097Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5529142Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5529306Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5529354Z return aot_autograd( 2025-09-07T07:34:42.5529396Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5529561Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5529642Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5529696Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5529888Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5529989Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5530044Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5530259Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5530311Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5530559Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5531769Z fx_g = _create_graph( 2025-09-07T07:34:42.5531813Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5532009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5532049Z fx_g = make_fx( 2025-09-07T07:34:42.5532089Z ^^^^^^^^ 2025-09-07T07:34:42.5532273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5532357Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5532402Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5532577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5532629Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5532673Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5532884Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5532931Z t = dispatch_trace( 2025-09-07T07:34:42.5532971Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5533107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5533156Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5533200Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5533349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5533397Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5533439Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5533634Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5533727Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5534943Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5535094Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5535139Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5535180Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5535332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5535410Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5535452Z ^^^^^^^^^ 2025-09-07T07:34:42.5535610Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5535658Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5535700Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5535877Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5535938Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5535978Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5536165Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5536238Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5536291Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5536568Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5536617Z outs_pair = fn(*args) 2025-09-07T07:34:42.5536658Z ^^^^^^^^^ 2025-09-07T07:34:42.5536865Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5536969Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5538199Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5538412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5538459Z outs_pair = fn(*args) 2025-09-07T07:34:42.5538499Z ^^^^^^^^^ 2025-09-07T07:34:42.5538711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5538815Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5538866Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5539097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5539181Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5539261Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5539468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5539514Z outs_pair = fn(*args) 2025-09-07T07:34:42.5539555Z ^^^^^^^^^ 2025-09-07T07:34:42.5539782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5539840Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5539884Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5540087Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5540143Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5540187Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5540339Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5540389Z return handle_torch_function( 2025-09-07T07:34:42.5540432Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5541757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5541847Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5541931Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5542131Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5542179Z return func(*args, **kwargs) 2025-09-07T07:34:42.5542222Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5542371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5542422Z result = _engine_run_backward( 2025-09-07T07:34:42.5542465Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5542641Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5542784Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5542843Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5542998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5543048Z return user_fn(self, *args) 2025-09-07T07:34:42.5543091Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5543265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5543333Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5543378Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5543567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5543620Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5543663Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5544971Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5545051Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5545094Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5545291Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5545354Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5545400Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5545564Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5545642Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5545688Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5545883Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5545939Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5545987Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5546180Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5546226Z t = dispatch_trace( 2025-09-07T07:34:42.5546266Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5546401Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5546453Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5546586Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5546736Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5546783Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5546824Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5547015Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5548282Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5548367Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5548517Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5548564Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5548605Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5548758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5548806Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5548848Z ^^^^^^^^^ 2025-09-07T07:34:42.5549026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5549083Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5549125Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5549174Z File "", line 1, in 2025-09-07T07:34:42.5549349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5549443Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5549497Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5549658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5549738Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5549787Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5550017Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5550068Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5550110Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5550314Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5551561Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5551605Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5551777Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5551827Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5551871Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5552058Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5552165Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5552219Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5552368Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5552442Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5552496Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5552646Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5552692Z leaves = list(leaves) 2025-09-07T07:34:42.5552732Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5552879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5552922Z return func(x) 2025-09-07T07:34:42.5552962Z ^^^^^^^ 2025-09-07T07:34:42.5553127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5553204Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5553254Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5553454Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5554679Z return func(*args, **kwargs) 2025-09-07T07:34:42.5554723Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5554940Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5555040Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5555044Z 2025-09-07T07:34:42.5555294Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5555298Z 2025-09-07T07:34:42.5555300Z 2025-09-07T07:34:42.5555386Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5555614Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.5555621Z 2025-09-07T07:34:42.5555723Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5555810Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5555853Z inline_call [] 2025-09-07T07:34:42.5555918Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5555959Z inductor [] 2025-09-07T07:34:42.5556068Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5556156Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5556465Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5556677Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5556742Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5556950Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5557053Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5557207Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5557354Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5557458Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5558681Z inline_call [] 2025-09-07T07:34:42.5558745Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5558832Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5558914Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5559220Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5559354Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5559414Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5559595Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5559698Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5559850Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5559992Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5560051Z =================================== FAILURES =================================== 2025-09-07T07:34:42.5560278Z _ WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.5560330Z Traceback (most recent call last): 2025-09-07T07:34:42.5560509Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5560551Z self._run_test( 2025-09-07T07:34:42.5560687Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5560752Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5560801Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5560959Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5561014Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5562233Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5562415Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5562474Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5562521Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5562682Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5562735Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5562807Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5562978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5563076Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5563122Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5563303Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5563358Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5563556Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5563619Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5563667Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5563836Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5563898Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5563961Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5564101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5564180Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5564233Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5564384Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5565652Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5565707Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5565902Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5566727Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5566781Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5566947Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5566995Z return aot_autograd( 2025-09-07T07:34:42.5567036Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5567199Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5567281Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5567367Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5567559Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5567659Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5567714Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5567933Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5567984Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5568205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5568253Z fx_g = _create_graph( 2025-09-07T07:34:42.5568294Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5568493Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5568533Z fx_g = make_fx( 2025-09-07T07:34:42.5569739Z ^^^^^^^^ 2025-09-07T07:34:42.5569921Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5570010Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5570056Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5570232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5570283Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5570327Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5570515Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5570564Z t = dispatch_trace( 2025-09-07T07:34:42.5570628Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5570763Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5570811Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5570855Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5571004Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5571052Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5571117Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5571313Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5571406Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5571455Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5571602Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5571652Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5571693Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5573000Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5573050Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5573091Z ^^^^^^^^^ 2025-09-07T07:34:42.5573251Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5573300Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5573342Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5573518Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5573578Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5573618Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5573829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5573904Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5573957Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5574166Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5574213Z outs_pair = fn(*args) 2025-09-07T07:34:42.5574254Z ^^^^^^^^^ 2025-09-07T07:34:42.5574460Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5574539Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5574591Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5574798Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5574847Z outs_pair = fn(*args) 2025-09-07T07:34:42.5574887Z ^^^^^^^^^ 2025-09-07T07:34:42.5576252Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5576324Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5576397Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5576716Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5576800Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5576852Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5577062Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5577141Z outs_pair = fn(*args) 2025-09-07T07:34:42.5577183Z ^^^^^^^^^ 2025-09-07T07:34:42.5577410Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5577464Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5577507Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5577751Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5577807Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5577851Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5578002Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5578052Z return handle_torch_function( 2025-09-07T07:34:42.5578096Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5578266Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5578355Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5578407Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5578610Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5579833Z return func(*args, **kwargs) 2025-09-07T07:34:42.5579878Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5580027Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5580077Z result = _engine_run_backward( 2025-09-07T07:34:42.5580119Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5580293Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5580465Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5580524Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5580673Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5580724Z return user_fn(self, *args) 2025-09-07T07:34:42.5580769Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5580941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5580991Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5581035Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5581225Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5581280Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5581323Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5581471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5581518Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5581559Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5581778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5582999Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5583047Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5583211Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5583270Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5583315Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5583534Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5583590Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5583636Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5583823Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5583870Z t = dispatch_trace( 2025-09-07T07:34:42.5583910Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5584061Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5584112Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5584155Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5584302Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5584350Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5584392Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5584583Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5584675Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5584724Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5584872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5584918Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5586104Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5586256Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5586304Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5586345Z ^^^^^^^^^ 2025-09-07T07:34:42.5586596Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5586689Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5586730Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5586778Z File "", line 1, in 2025-09-07T07:34:42.5586952Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5587045Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5587100Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5587260Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5587316Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5587360Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5587587Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5587642Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5587684Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5587887Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5587939Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5588006Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5588179Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5588228Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5589431Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5589591Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5589694Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5589779Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5589928Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5589997Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5590049Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5590199Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5590269Z leaves = list(leaves) 2025-09-07T07:34:42.5590311Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5590459Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5590500Z return func(x) 2025-09-07T07:34:42.5590539Z ^^^^^^^ 2025-09-07T07:34:42.5590703Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5590783Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5590832Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5591029Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5591077Z return func(*args, **kwargs) 2025-09-07T07:34:42.5591121Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5591337Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5591437Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5591439Z 2025-09-07T07:34:42.5592830Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5592863Z 2025-09-07T07:34:42.5592868Z 2025-09-07T07:34:42.5592956Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5593185Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.5593188Z 2025-09-07T07:34:42.5593288Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5593376Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5593420Z inline_call [] 2025-09-07T07:34:42.5593483Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5593523Z inductor [] 2025-09-07T07:34:42.5593610Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5593694Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5594006Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5594145Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5594207Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5594388Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5594507Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5594665Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5594806Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5594890Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5594931Z inline_call [] 2025-09-07T07:34:42.5594997Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5595103Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5595185Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5596725Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5596892Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5596952Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5597130Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5597231Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5597385Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5597530Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5597613Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5597654Z inline_call [] 2025-09-07T07:34:42.5597717Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5597804Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5597886Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5598190Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5598318Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5598402Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5598580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5598681Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5598833Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5598974Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5599232Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-36d5cfe937bbecc8.xml - 2025-09-07T07:34:42.5599301Z =========================== short test summary info ============================ 2025-09-07T07:34:42.5600965Z FAILED [0.7031s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5601073Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5601076Z 2025-09-07T07:34:42.5601318Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5601348Z 2025-09-07T07:34:42.5601351Z 2025-09-07T07:34:42.5601438Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5601666Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.5601670Z 2025-09-07T07:34:42.5601770Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5601840Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.5601940Z ================== 1 failed, 245 deselected, 2 rerun in 2.80s ================== 2025-09-07T07:34:42.5601981Z Got exit code 1 2025-09-07T07:34:42.5602128Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-09-07T07:34:42.5602654Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.5602703Z import pkg_resources 2025-09-07T07:34:42.5602904Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-5531b6dec9b76b42.xml 2025-09-07T07:34:42.5602971Z ============================= test session starts ============================== 2025-09-07T07:34:42.5603106Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.5603155Z cachedir: .pytest_cache 2025-09-07T07:34:42.5603338Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.5603390Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.5603435Z configfile: pytest.ini 2025-09-07T07:34:42.5603628Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.5603715Z collecting ... collected 467 items / 66 deselected / 401 selected 2025-09-07T07:34:42.5604935Z stepcurrent: skipping 66 already run items. 2025-09-07T07:34:42.5604986Z Running 180 items in this shard 2025-09-07T07:34:42.5604988Z 2025-09-07T07:34:42.5605197Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_False_autograd_False PASSED [2.1032s] [ 0%] 2025-09-07T07:34:42.5605453Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [1.0105s] [ 1%] 2025-09-07T07:34:42.5605682Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.8500s] [ 1%] 2025-09-07T07:34:42.5605881Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True FAILED [0.7326s] [ 1%] 2025-09-07T07:34:42.5605886Z 2025-09-07T07:34:42.5605944Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.5606071Z _ WhileLoopTests.test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.5606122Z Traceback (most recent call last): 2025-09-07T07:34:42.5606303Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5606349Z self._run_test( 2025-09-07T07:34:42.5606566Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5606634Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5606682Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5606842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5606924Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5606974Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5607155Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5607210Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5607254Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5607417Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5607493Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5608708Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5608881Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5608978Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5609025Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5609232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5609286Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5609467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5609529Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5609577Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5609751Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5609812Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5609857Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5609995Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5610078Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5610130Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5610282Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5610356Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5610406Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5610571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5610651Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5610695Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5612023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5612071Z return aot_autograd( 2025-09-07T07:34:42.5612114Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5612280Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5612362Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5612415Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5612606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5612704Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5612761Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5612981Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5613033Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5613290Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5613340Z fx_g = _create_graph( 2025-09-07T07:34:42.5613382Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5613576Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5613618Z fx_g = make_fx( 2025-09-07T07:34:42.5613658Z ^^^^^^^^ 2025-09-07T07:34:42.5613839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5613913Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5613957Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5614131Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5614183Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5615372Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5615589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5615635Z t = dispatch_trace( 2025-09-07T07:34:42.5615675Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5615810Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5615860Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5615901Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5616054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5616100Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5616143Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5616333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5616427Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5616477Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5616693Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5616738Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5616780Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5616930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5617017Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5617059Z ^^^^^^^^^ 2025-09-07T07:34:42.5617216Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5617264Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5618469Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5618651Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5618712Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5618751Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5618939Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5619011Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5619062Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5619273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5619322Z outs_pair = fn(*args) 2025-09-07T07:34:42.5619364Z ^^^^^^^^^ 2025-09-07T07:34:42.5619568Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5619673Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5619728Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5619934Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5619979Z outs_pair = fn(*args) 2025-09-07T07:34:42.5620020Z ^^^^^^^^^ 2025-09-07T07:34:42.5620232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5620329Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5620378Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5620612Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5620695Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5620749Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5622128Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5622175Z outs_pair = fn(*args) 2025-09-07T07:34:42.5622215Z ^^^^^^^^^ 2025-09-07T07:34:42.5622442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5622498Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5622542Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5622743Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5622797Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5622839Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5622992Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5623041Z return handle_torch_function( 2025-09-07T07:34:42.5623084Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5623253Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5623342Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5623415Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5623619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5623667Z return func(*args, **kwargs) 2025-09-07T07:34:42.5623709Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5623859Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5623908Z result = _engine_run_backward( 2025-09-07T07:34:42.5623952Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5624126Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5625423Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5625480Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5625635Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5625684Z return user_fn(self, *args) 2025-09-07T07:34:42.5625727Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5625899Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5625971Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5626015Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5626204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5626256Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5626299Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5626446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5626567Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5626631Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5626828Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5626888Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5626936Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5627097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5627177Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5627223Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5627415Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5628629Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5628676Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5628866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5628914Z t = dispatch_trace( 2025-09-07T07:34:42.5628955Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5629089Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5629139Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5629183Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5629332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5629377Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5629419Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5629609Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5629702Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5629782Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5629930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5629975Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5630015Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5630164Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5630215Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5630255Z ^^^^^^^^^ 2025-09-07T07:34:42.5630433Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5630490Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5631671Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5631722Z File "", line 1, in 2025-09-07T07:34:42.5631894Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5631989Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5632041Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5632205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5632306Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5632351Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5632581Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5632631Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5632673Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5632879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5632951Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5632996Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5633167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5633218Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5633259Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5633437Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5633541Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5633596Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5633743Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5634959Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5635014Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5635164Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5635209Z leaves = list(leaves) 2025-09-07T07:34:42.5635250Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5635394Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5635438Z return func(x) 2025-09-07T07:34:42.5635477Z ^^^^^^^ 2025-09-07T07:34:42.5635642Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5635718Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5635767Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5635962Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5636033Z return func(*args, **kwargs) 2025-09-07T07:34:42.5636075Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5636288Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5636389Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5636391Z 2025-09-07T07:34:42.5636693Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5636696Z 2025-09-07T07:34:42.5636698Z 2025-09-07T07:34:42.5636783Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5637013Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.5637017Z 2025-09-07T07:34:42.5637118Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5637205Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5637246Z inline_call [] 2025-09-07T07:34:42.5638473Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5638562Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5638677Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5638985Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5639119Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5639179Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5639360Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5639495Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5639654Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5639796Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5639947Z _ WhileLoopTests.test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.5639998Z Traceback (most recent call last): 2025-09-07T07:34:42.5640255Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5640297Z self._run_test( 2025-09-07T07:34:42.5640432Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5640500Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5640549Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5640705Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5640759Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5640806Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5640985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5642202Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5642250Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5642412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5642463Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5642508Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5642710Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5642809Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5642854Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5643035Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5643089Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5643269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5643331Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5643378Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5643545Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5643605Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5643653Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5643791Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5643867Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5643918Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5644084Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5644160Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5644209Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5645519Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5645572Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5645617Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5645782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5645853Z return aot_autograd( 2025-09-07T07:34:42.5645894Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5646054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5646137Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5646189Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5646396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5646547Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5646602Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5646819Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5646873Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5647091Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5647139Z fx_g = _create_graph( 2025-09-07T07:34:42.5647180Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5647375Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5647415Z fx_g = make_fx( 2025-09-07T07:34:42.5647455Z ^^^^^^^^ 2025-09-07T07:34:42.5647635Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5648845Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5648891Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5649101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5649151Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5649194Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5649382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5649429Z t = dispatch_trace( 2025-09-07T07:34:42.5649469Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5649604Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5649652Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5649694Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5649841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5649888Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5649932Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5650124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5650216Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5650264Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5650433Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5650479Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5650522Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5650672Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5650722Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5651907Z ^^^^^^^^^ 2025-09-07T07:34:42.5652066Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5652143Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5652185Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5652361Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5652420Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5652459Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5652645Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5652738Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5652790Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5652997Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5653043Z outs_pair = fn(*args) 2025-09-07T07:34:42.5653084Z ^^^^^^^^^ 2025-09-07T07:34:42.5653289Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5653367Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5653419Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5653625Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5653672Z outs_pair = fn(*args) 2025-09-07T07:34:42.5653711Z ^^^^^^^^^ 2025-09-07T07:34:42.5653920Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5653990Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5655175Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5655431Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5655516Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5655569Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5655774Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5655819Z outs_pair = fn(*args) 2025-09-07T07:34:42.5655859Z ^^^^^^^^^ 2025-09-07T07:34:42.5656086Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5656138Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5656181Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5656380Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5656439Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5656549Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5656700Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5656749Z return handle_torch_function( 2025-09-07T07:34:42.5656819Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5656988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5657075Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5657126Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5657324Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5657396Z return func(*args, **kwargs) 2025-09-07T07:34:42.5657438Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5658730Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5658781Z result = _engine_run_backward( 2025-09-07T07:34:42.5658822Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5658998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5659162Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5659221Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5659370Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5659418Z return user_fn(self, *args) 2025-09-07T07:34:42.5659463Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5659635Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5659685Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5659728Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5659914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5659964Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5660008Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5660152Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5660199Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5660239Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5660433Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5660517Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5660566Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5660724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5661913Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5661959Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5662152Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5662206Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5662252Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5662442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5662487Z t = dispatch_trace( 2025-09-07T07:34:42.5662530Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5662665Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5662714Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5662757Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5662901Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5662973Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5663014Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5663205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5663295Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5663342Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5663488Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5663555Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5663595Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5663742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5664922Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5664962Z ^^^^^^^^^ 2025-09-07T07:34:42.5665140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5665216Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5665257Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5665306Z File "", line 1, in 2025-09-07T07:34:42.5665475Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5665564Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5665619Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5665782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5665839Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5665882Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5666111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5666162Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5666204Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5666404Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5666457Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5666565Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5666736Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5666813Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5666855Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5667012Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5668265Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5668320Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5668471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5668540Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5668591Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5668740Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5668788Z leaves = list(leaves) 2025-09-07T07:34:42.5668829Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5668972Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5669014Z return func(x) 2025-09-07T07:34:42.5669051Z ^^^^^^^ 2025-09-07T07:34:42.5669239Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5669315Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5669365Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5669563Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5669611Z return func(*args, **kwargs) 2025-09-07T07:34:42.5669652Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5669867Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5669988Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5669990Z 2025-09-07T07:34:42.5670233Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5670236Z 2025-09-07T07:34:42.5670239Z 2025-09-07T07:34:42.5670323Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5671709Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.5671712Z 2025-09-07T07:34:42.5671815Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5671902Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5671943Z inline_call [] 2025-09-07T07:34:42.5672010Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5672096Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5672183Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5672489Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5672623Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5672682Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5672860Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5672960Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5673137Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5673280Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5673363Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5673403Z inline_call [] 2025-09-07T07:34:42.5673468Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5673552Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5673634Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5673931Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5674061Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5674122Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5675432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5675532Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5675702Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5675843Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5675902Z =================================== FAILURES =================================== 2025-09-07T07:34:42.5676026Z _ WhileLoopTests.test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.5676077Z Traceback (most recent call last): 2025-09-07T07:34:42.5676249Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5676313Z self._run_test( 2025-09-07T07:34:42.5676447Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5676665Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5676713Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5676871Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5676925Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5676995Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5677174Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5677228Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5677274Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5677435Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5677492Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5677535Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5678866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5678963Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5679010Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5679190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5679245Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5679420Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5679483Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5679562Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5679731Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5679790Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5679835Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5679970Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5680047Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5680100Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5680329Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5680403Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5680453Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5680619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5680673Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5680715Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5680878Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5680923Z return aot_autograd( 2025-09-07T07:34:42.5682141Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5682309Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5682391Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5682443Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5682634Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5682733Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5682812Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5683026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5683076Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5683312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5683358Z fx_g = _create_graph( 2025-09-07T07:34:42.5683400Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5683591Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5683632Z fx_g = make_fx( 2025-09-07T07:34:42.5683670Z ^^^^^^^^ 2025-09-07T07:34:42.5683851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5683906Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5683950Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5684122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5684172Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5684215Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5685545Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5685591Z t = dispatch_trace( 2025-09-07T07:34:42.5685632Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5685765Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5685814Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5685888Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5686043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5686089Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5686131Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5686321Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5686414Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5686462Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5686684Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5686731Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5686772Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5686920Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5686973Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5687014Z ^^^^^^^^^ 2025-09-07T07:34:42.5687169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5687216Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5687257Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5687458Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5688669Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5688711Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5688896Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5688969Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5689019Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5689228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5689308Z outs_pair = fn(*args) 2025-09-07T07:34:42.5689350Z ^^^^^^^^^ 2025-09-07T07:34:42.5689550Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5689628Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5689705Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5689911Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5689956Z outs_pair = fn(*args) 2025-09-07T07:34:42.5689997Z ^^^^^^^^^ 2025-09-07T07:34:42.5690204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5690276Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5690325Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5690554Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5690638Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5690690Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5690894Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5692075Z outs_pair = fn(*args) 2025-09-07T07:34:42.5692115Z ^^^^^^^^^ 2025-09-07T07:34:42.5692339Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5692422Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5692465Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5692664Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5692716Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5692761Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5692908Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5692958Z return handle_torch_function( 2025-09-07T07:34:42.5693000Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5693165Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5693252Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5693307Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5693506Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5693554Z return func(*args, **kwargs) 2025-09-07T07:34:42.5693595Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5693758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5693808Z result = _engine_run_backward( 2025-09-07T07:34:42.5693851Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5694022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5694163Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5695346Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5695523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5695572Z return user_fn(self, *args) 2025-09-07T07:34:42.5695614Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5695782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5695833Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5695876Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5696079Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5696131Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5696173Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5696318Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5696366Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5696410Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5696656Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5696718Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5696764Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5696928Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5696989Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5697034Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5697226Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5697283Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5697327Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5698685Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5698730Z t = dispatch_trace( 2025-09-07T07:34:42.5698770Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5698904Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5698954Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5698997Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5699144Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5699189Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5699230Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5699416Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5699508Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5699561Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5699705Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5699749Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5699788Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5699964Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5700013Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5700055Z ^^^^^^^^^ 2025-09-07T07:34:42.5700231Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5700288Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5700326Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5701498Z File "", line 1, in 2025-09-07T07:34:42.5701669Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5701788Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5701840Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5702000Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5702055Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5702124Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5702348Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5702399Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5702440Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5702642Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5702695Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5702739Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5702906Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5702956Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5702998Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5703158Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5703260Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5703313Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5703457Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5703564Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5703616Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5704900Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5704946Z leaves = list(leaves) 2025-09-07T07:34:42.5704987Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5705133Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5705173Z return func(x) 2025-09-07T07:34:42.5705214Z ^^^^^^^ 2025-09-07T07:34:42.5705374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5705449Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5705497Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5705693Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5705746Z return func(*args, **kwargs) 2025-09-07T07:34:42.5705788Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5706001Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5706101Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5706104Z 2025-09-07T07:34:42.5706366Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5706369Z 2025-09-07T07:34:42.5706371Z 2025-09-07T07:34:42.5706456Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5706752Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.5706783Z 2025-09-07T07:34:42.5706884Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5706970Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5707012Z inline_call [] 2025-09-07T07:34:42.5707075Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5708312Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5708398Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5708736Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5708867Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5708928Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5709108Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5709212Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5709365Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5709509Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5709594Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5709635Z inline_call [] 2025-09-07T07:34:42.5709697Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5709782Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5709863Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5710163Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5710320Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5710379Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5710555Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5710657Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5710808Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5710947Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5711028Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5712206Z inline_call [] 2025-09-07T07:34:42.5712272Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5712356Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5712436Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5712759Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5712889Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5712948Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5713123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5713222Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5713377Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5713532Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5713783Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-5531b6dec9b76b42.xml - 2025-09-07T07:34:42.5713852Z =========================== short test summary info ============================ 2025-09-07T07:34:42.5714295Z FAILED [0.7326s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5714395Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5714397Z 2025-09-07T07:34:42.5714637Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5714642Z 2025-09-07T07:34:42.5714644Z 2025-09-07T07:34:42.5714728Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5714952Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.5714956Z 2025-09-07T07:34:42.5715055Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5715124Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.5715205Z ============= 1 failed, 1 passed, 66 deselected, 2 rerun in 5.08s ============== 2025-09-07T07:34:42.5716383Z Got exit code 1 2025-09-07T07:34:42.5716431Z Retrying single test... 2025-09-07T07:34:42.5716981Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.5717062Z import pkg_resources 2025-09-07T07:34:42.5717262Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-27c2a2cce5bd2078.xml 2025-09-07T07:34:42.5717329Z ============================= test session starts ============================== 2025-09-07T07:34:42.5717463Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.5717509Z cachedir: .pytest_cache 2025-09-07T07:34:42.5717692Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.5717744Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.5717790Z configfile: pytest.ini 2025-09-07T07:34:42.5717981Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.5718069Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.5718339Z stepcurrent: skipping 67 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.5718410Z Running 1 items in this shard 2025-09-07T07:34:42.5718413Z 2025-09-07T07:34:42.5718644Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [1.1718s] [100%] 2025-09-07T07:34:42.5718869Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.7538s] [100%] 2025-09-07T07:34:42.5719063Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True FAILED [0.7105s] [100%] 2025-09-07T07:34:42.5719090Z 2025-09-07T07:34:42.5719146Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.5719270Z _ WhileLoopTests.test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.5719320Z Traceback (most recent call last): 2025-09-07T07:34:42.5720727Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5720796Z self._run_test( 2025-09-07T07:34:42.5720931Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5720995Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5721043Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5721202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5721263Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5721309Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5721486Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5721540Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5721585Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5721746Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5721798Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5721841Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5722009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5722103Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5722168Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5722349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5722403Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5722577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5722641Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5722687Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5723999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5724059Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5724105Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5724241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5724324Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5724375Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5724525Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5724600Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5724672Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5724839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5724891Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5724935Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5725096Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5725142Z return aot_autograd( 2025-09-07T07:34:42.5725183Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5725360Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5725439Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5725492Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5725679Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5725792Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5725846Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5727276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5727328Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5727547Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5727597Z fx_g = _create_graph( 2025-09-07T07:34:42.5727639Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5727830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5727871Z fx_g = make_fx( 2025-09-07T07:34:42.5727910Z ^^^^^^^^ 2025-09-07T07:34:42.5728091Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5728145Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5728189Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5728365Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5728415Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5728491Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5728682Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5728727Z t = dispatch_trace( 2025-09-07T07:34:42.5728766Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5728898Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5728947Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5728989Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5729135Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5729182Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5730359Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5730550Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5730645Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5730695Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5730841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5730886Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5730925Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5731103Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5731153Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5731195Z ^^^^^^^^^ 2025-09-07T07:34:42.5731349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5731396Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5731436Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5731612Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5731700Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5731739Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5731924Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5731996Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5732049Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5732276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5732324Z outs_pair = fn(*args) 2025-09-07T07:34:42.5732364Z ^^^^^^^^^ 2025-09-07T07:34:42.5733703Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5733782Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5733839Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5734044Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5734089Z outs_pair = fn(*args) 2025-09-07T07:34:42.5734128Z ^^^^^^^^^ 2025-09-07T07:34:42.5734338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5734407Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5734457Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5734684Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5734766Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5734841Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5735042Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5735087Z outs_pair = fn(*args) 2025-09-07T07:34:42.5735127Z ^^^^^^^^^ 2025-09-07T07:34:42.5735353Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5735407Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5735450Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5735649Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5735702Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5735745Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5737121Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5737174Z return handle_torch_function( 2025-09-07T07:34:42.5737216Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5737382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5737497Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5737553Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5737751Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5737799Z return func(*args, **kwargs) 2025-09-07T07:34:42.5737840Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5737985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5738059Z result = _engine_run_backward( 2025-09-07T07:34:42.5738100Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5738273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5738415Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5738474Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5738644Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5738695Z return user_fn(self, *args) 2025-09-07T07:34:42.5738736Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5738906Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5738957Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5739004Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5739188Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5740388Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5740431Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5740581Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5740627Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5740670Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5740864Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5740924Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5740970Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5741129Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5741216Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5741261Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5741449Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5741505Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5741552Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5741740Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5741785Z t = dispatch_trace( 2025-09-07T07:34:42.5741824Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5741957Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5742006Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5742052Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5742198Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5743382Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5743424Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5743615Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5743724Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5743775Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5743922Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5743968Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5744007Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5744156Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5744226Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5744267Z ^^^^^^^^^ 2025-09-07T07:34:42.5744443Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5744501Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5744540Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5744589Z File "", line 1, in 2025-09-07T07:34:42.5744774Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5744867Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5744919Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5745079Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5745137Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5745183Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5745409Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5746699Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5746743Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5746949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5747000Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5747043Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5747212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5747260Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5747302Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5747490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5747597Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5747651Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5747802Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5747873Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5747925Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5748074Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5748120Z leaves = list(leaves) 2025-09-07T07:34:42.5748160Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5748306Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5748350Z return func(x) 2025-09-07T07:34:42.5748389Z ^^^^^^^ 2025-09-07T07:34:42.5748550Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5749773Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5749822Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5750049Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5750097Z return func(*args, **kwargs) 2025-09-07T07:34:42.5750140Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5750355Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5750456Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5750461Z 2025-09-07T07:34:42.5750705Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5750735Z 2025-09-07T07:34:42.5750737Z 2025-09-07T07:34:42.5750823Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5751051Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.5751054Z 2025-09-07T07:34:42.5751176Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5751264Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5751305Z inline_call [] 2025-09-07T07:34:42.5751369Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5751410Z inductor [] 2025-09-07T07:34:42.5751497Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5751583Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5751889Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5752022Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5752084Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5752263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5752363Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5753662Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5753804Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5753955Z _ WhileLoopTests.test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.5754006Z Traceback (most recent call last): 2025-09-07T07:34:42.5754179Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5754220Z self._run_test( 2025-09-07T07:34:42.5754355Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5754421Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5754468Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5754625Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5754679Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5754726Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5754905Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5754961Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5755006Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5755167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5755245Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5755290Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5755459Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5755554Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5755599Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5756995Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5757082Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5757261Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5757323Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5757370Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5757538Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5757621Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5757666Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5757804Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5757880Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5757932Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5758086Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5758163Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5758210Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5758375Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5758428Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5758472Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5758635Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5758680Z return aot_autograd( 2025-09-07T07:34:42.5758721Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5758881Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5758986Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5760244Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5760438Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5760535Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5760591Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5760811Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5760862Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5761081Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5761127Z fx_g = _create_graph( 2025-09-07T07:34:42.5761170Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5761364Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5761404Z fx_g = make_fx( 2025-09-07T07:34:42.5761443Z ^^^^^^^^ 2025-09-07T07:34:42.5761622Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5761704Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5761750Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5761924Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5761974Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5762016Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5762202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5762267Z t = dispatch_trace( 2025-09-07T07:34:42.5762306Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5763585Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5763635Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5763677Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5763825Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5763871Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5763933Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5764124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5764216Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5764264Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5764413Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5764460Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5764502Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5764649Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5764697Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5764737Z ^^^^^^^^^ 2025-09-07T07:34:42.5764896Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5764942Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5764984Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5765159Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5765217Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5765275Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5765460Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5766740Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5766797Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5767006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5767053Z outs_pair = fn(*args) 2025-09-07T07:34:42.5767094Z ^^^^^^^^^ 2025-09-07T07:34:42.5767295Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5767372Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5767424Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5767631Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5767678Z outs_pair = fn(*args) 2025-09-07T07:34:42.5767717Z ^^^^^^^^^ 2025-09-07T07:34:42.5767927Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5768027Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5768078Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5768307Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5768389Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5768442Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5768643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5768713Z outs_pair = fn(*args) 2025-09-07T07:34:42.5768752Z ^^^^^^^^^ 2025-09-07T07:34:42.5768975Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5769027Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5770218Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5770440Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5770495Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5770537Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5770684Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5770736Z return handle_torch_function( 2025-09-07T07:34:42.5770781Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5770946Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5771034Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5771086Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5771286Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5771332Z return func(*args, **kwargs) 2025-09-07T07:34:42.5771375Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5771520Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5771569Z result = _engine_run_backward( 2025-09-07T07:34:42.5771610Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5771809Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5771953Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5772012Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5772162Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5772210Z return user_fn(self, *args) 2025-09-07T07:34:42.5773394Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5773567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5773617Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5773659Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5773845Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5773900Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5773943Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5774088Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5774134Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5774175Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5774393Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5774453Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5774500Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5774659Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5774716Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5774763Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5774972Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5775027Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5775073Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5775261Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5775306Z t = dispatch_trace( 2025-09-07T07:34:42.5775361Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5776708Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5776760Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5776802Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5776946Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5776995Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5777035Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5777225Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5777315Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5777361Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5777510Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5777554Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5777595Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5777742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5777791Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5777830Z ^^^^^^^^^ 2025-09-07T07:34:42.5778046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5778105Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5778145Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5778193Z File "", line 1, in 2025-09-07T07:34:42.5778362Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5778452Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5779648Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5779813Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5779869Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5779913Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5780140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5780193Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5780235Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5780435Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5780489Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5780557Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5780728Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5780778Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5780820Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5780977Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5781079Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5781159Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5781305Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5781374Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5781424Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5781596Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5781642Z leaves = list(leaves) 2025-09-07T07:34:42.5781683Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5782966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5783009Z return func(x) 2025-09-07T07:34:42.5783048Z ^^^^^^^ 2025-09-07T07:34:42.5783209Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5783289Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5783336Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5783532Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5783580Z return func(*args, **kwargs) 2025-09-07T07:34:42.5783622Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5783838Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5783938Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5783940Z 2025-09-07T07:34:42.5784183Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5784209Z 2025-09-07T07:34:42.5784212Z 2025-09-07T07:34:42.5784296Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5784523Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.5784526Z 2025-09-07T07:34:42.5784626Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5784716Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5784756Z inline_call [] 2025-09-07T07:34:42.5784819Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5784858Z inductor [] 2025-09-07T07:34:42.5784948Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5785032Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5786544Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5786683Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5786745Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5786950Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5787054Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5787206Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5787347Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5787430Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5787472Z inline_call [] 2025-09-07T07:34:42.5787556Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5787641Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5787723Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5788029Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5788182Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5788244Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5788421Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5788522Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5788676Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5788817Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5788875Z =================================== FAILURES =================================== 2025-09-07T07:34:42.5789001Z _ WhileLoopTests.test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.5789053Z Traceback (most recent call last): 2025-09-07T07:34:42.5790385Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5790428Z self._run_test( 2025-09-07T07:34:42.5790560Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5790623Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5790671Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5790853Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5790909Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5790957Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5791134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5791190Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5791235Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5791396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5791448Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5791492Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5791660Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5791758Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5791804Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5791984Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5792037Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5792229Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5792293Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5793481Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5793649Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5793710Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5793754Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5793893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5793991Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5794043Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5794192Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5794267Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5794331Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5794497Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5794548Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5794593Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5794752Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5794801Z return aot_autograd( 2025-09-07T07:34:42.5794842Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5795002Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5795083Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5795136Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5795327Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5795423Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5795476Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5796935Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5797019Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5797241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5797288Z fx_g = _create_graph( 2025-09-07T07:34:42.5797328Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5797522Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5797561Z fx_g = make_fx( 2025-09-07T07:34:42.5797602Z ^^^^^^^^ 2025-09-07T07:34:42.5797782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5797836Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5797880Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5798055Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5798109Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5798151Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5798337Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5798382Z t = dispatch_trace( 2025-09-07T07:34:42.5798422Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5798576Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5798627Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5798668Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5798817Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5800011Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5800054Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5800292Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5800415Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5800462Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5800609Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5800654Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5800696Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5800872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5800922Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5800962Z ^^^^^^^^^ 2025-09-07T07:34:42.5801118Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5801165Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5801209Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5801388Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5801446Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5801485Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5801672Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5801745Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5801797Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5802002Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5802048Z outs_pair = fn(*args) 2025-09-07T07:34:42.5803233Z ^^^^^^^^^ 2025-09-07T07:34:42.5803437Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5803538Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5803590Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5803796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5803841Z outs_pair = fn(*args) 2025-09-07T07:34:42.5803882Z ^^^^^^^^^ 2025-09-07T07:34:42.5804094Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5804162Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5804211Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5804439Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5804524Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5804578Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5804780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5804826Z outs_pair = fn(*args) 2025-09-07T07:34:42.5804881Z ^^^^^^^^^ 2025-09-07T07:34:42.5805106Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5805158Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5805202Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5805398Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5805455Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5805517Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5806890Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5806941Z return handle_torch_function( 2025-09-07T07:34:42.5806984Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5807152Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5807267Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5807319Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5807517Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5807564Z return func(*args, **kwargs) 2025-09-07T07:34:42.5807607Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5807756Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5807806Z result = _engine_run_backward( 2025-09-07T07:34:42.5807847Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5808019Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5808161Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5808221Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5808369Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5808418Z return user_fn(self, *args) 2025-09-07T07:34:42.5808460Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5808629Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5808706Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5808749Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5810079Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5810134Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5810177Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5810323Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5810369Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5810409Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5810603Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5810662Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5810713Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5810874Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5810931Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5810975Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5811188Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5811245Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5811294Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5811481Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5811527Z t = dispatch_trace( 2025-09-07T07:34:42.5811566Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5811699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5811774Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5811817Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5811961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5813141Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5813182Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5813374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5813485Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5813534Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5813679Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5813723Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5813763Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5813914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5813966Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5814005Z ^^^^^^^^^ 2025-09-07T07:34:42.5814183Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5814239Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5814281Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5814330Z File "", line 1, in 2025-09-07T07:34:42.5814499Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5814589Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5814641Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5814803Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5814881Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5814925Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5816286Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5816338Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5816381Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5816646Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5816698Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5816741Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5816912Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5816964Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5817007Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5817163Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5817266Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5817320Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5817496Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5817568Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5817620Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5817769Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5817814Z leaves = list(leaves) 2025-09-07T07:34:42.5817855Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5817999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5818065Z return func(x) 2025-09-07T07:34:42.5818103Z ^^^^^^^ 2025-09-07T07:34:42.5818265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5819489Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5819541Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5819766Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5819815Z return func(*args, **kwargs) 2025-09-07T07:34:42.5819857Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5820070Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5820170Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5820176Z 2025-09-07T07:34:42.5820421Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5820424Z 2025-09-07T07:34:42.5820426Z 2025-09-07T07:34:42.5820510Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5820739Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.5820742Z 2025-09-07T07:34:42.5820842Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5820929Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5820969Z inline_call [] 2025-09-07T07:34:42.5821033Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5821098Z inductor [] 2025-09-07T07:34:42.5821187Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5821271Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5821575Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5821704Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5821766Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5821941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5823182Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5823337Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5823483Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5823565Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5823605Z inline_call [] 2025-09-07T07:34:42.5823667Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5823773Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5823857Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5824159Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5824288Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5824347Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5824551Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5824651Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5824803Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5824943Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5825046Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5825087Z inline_call [] 2025-09-07T07:34:42.5825149Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5825235Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5825316Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5825612Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5825742Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5827019Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5827199Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5827299Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5827453Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5827591Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5827845Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-27c2a2cce5bd2078.xml - 2025-09-07T07:34:42.5827971Z =========================== short test summary info ============================ 2025-09-07T07:34:42.5828394Z FAILED [0.7105s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5828493Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5828497Z 2025-09-07T07:34:42.5828738Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5828741Z 2025-09-07T07:34:42.5828743Z 2025-09-07T07:34:42.5828827Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5829054Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.5829059Z 2025-09-07T07:34:42.5829158Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5829228Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.5829304Z ================== 1 failed, 245 deselected, 2 rerun in 2.94s ================== 2025-09-07T07:34:42.5829371Z Got exit code 1 2025-09-07T07:34:42.5829417Z Retrying single test... 2025-09-07T07:34:42.5829914Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.5829960Z import pkg_resources 2025-09-07T07:34:42.5830161Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-c253830ee1b04621.xml 2025-09-07T07:34:42.5830249Z ============================= test session starts ============================== 2025-09-07T07:34:42.5831540Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.5831588Z cachedir: .pytest_cache 2025-09-07T07:34:42.5831772Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.5831846Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.5831891Z configfile: pytest.ini 2025-09-07T07:34:42.5832083Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.5832172Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.5832444Z stepcurrent: skipping 67 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.5832497Z Running 1 items in this shard 2025-09-07T07:34:42.5832500Z 2025-09-07T07:34:42.5832732Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [1.1032s] [100%] 2025-09-07T07:34:42.5832959Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.7733s] [100%] 2025-09-07T07:34:42.5833156Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True FAILED [0.8420s] [100%] 2025-09-07T07:34:42.5833158Z 2025-09-07T07:34:42.5833214Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.5833340Z _ WhileLoopTests.test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.5833416Z Traceback (most recent call last): 2025-09-07T07:34:42.5833595Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5833636Z self._run_test( 2025-09-07T07:34:42.5833771Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5833836Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5833885Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5834045Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5835240Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5835288Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5835467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5835524Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5835572Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5835732Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5835784Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5835827Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5836019Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5836117Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5836161Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5836340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5836393Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5836635Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5836727Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5836774Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5836942Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5837001Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5837047Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5837212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5837290Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5837341Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5838639Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5838717Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5838769Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5838935Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5838986Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5839029Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5839192Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5839240Z return aot_autograd( 2025-09-07T07:34:42.5839281Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5839442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5839523Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5839577Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5839796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5839894Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5839946Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5840234Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5840286Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5840505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5840551Z fx_g = _create_graph( 2025-09-07T07:34:42.5840591Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5840785Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5841978Z fx_g = make_fx( 2025-09-07T07:34:42.5842018Z ^^^^^^^^ 2025-09-07T07:34:42.5842200Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5842254Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5842297Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5842500Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5842551Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5842595Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5842782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5842827Z t = dispatch_trace( 2025-09-07T07:34:42.5842867Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5843004Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5843072Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5843114Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5843262Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5843309Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5843352Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5843560Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5843652Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5843701Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5843846Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5843891Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5845067Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5845217Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5845265Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5845306Z ^^^^^^^^^ 2025-09-07T07:34:42.5845462Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5845510Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5845554Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5845728Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5845788Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5845826Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5846009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5846110Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5846163Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5846368Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5846414Z outs_pair = fn(*args) 2025-09-07T07:34:42.5846454Z ^^^^^^^^^ 2025-09-07T07:34:42.5846719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5846797Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5846849Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5847050Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5847099Z outs_pair = fn(*args) 2025-09-07T07:34:42.5847139Z ^^^^^^^^^ 2025-09-07T07:34:42.5848492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5848564Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5848614Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5848879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5848963Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5849016Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5849219Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5849267Z outs_pair = fn(*args) 2025-09-07T07:34:42.5849330Z ^^^^^^^^^ 2025-09-07T07:34:42.5849555Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5849607Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5849649Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5849850Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5849930Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5849973Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5850121Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5850169Z return handle_torch_function( 2025-09-07T07:34:42.5850212Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5850376Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5850470Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5850522Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5851857Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5851908Z return func(*args, **kwargs) 2025-09-07T07:34:42.5851951Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5852096Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5852146Z result = _engine_run_backward( 2025-09-07T07:34:42.5852187Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5852359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5852527Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5852587Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5852735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5852784Z return user_fn(self, *args) 2025-09-07T07:34:42.5852828Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5852998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5853049Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5853092Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5853278Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5853330Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5853373Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5853518Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5853564Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5853606Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5853798Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5855011Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5855061Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5855222Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5855280Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5855325Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5855514Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5855593Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5855639Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5855825Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5855870Z t = dispatch_trace( 2025-09-07T07:34:42.5855910Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5856061Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5856111Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5856153Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5856297Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5856342Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5856382Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5856641Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5856735Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5856784Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5856929Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5858119Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5858160Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5858312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5858360Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5858401Z ^^^^^^^^^ 2025-09-07T07:34:42.5858576Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5858666Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5858708Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5858756Z File "", line 1, in 2025-09-07T07:34:42.5858924Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5859014Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5859068Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5859227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5859284Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5859328Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5859558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5859611Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5859655Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5859854Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5859906Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5859948Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5860143Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5861328Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5861372Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5861529Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5861632Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5861684Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5861866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5861936Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5861987Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5862135Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5862181Z leaves = list(leaves) 2025-09-07T07:34:42.5862221Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5862388Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5862429Z return func(x) 2025-09-07T07:34:42.5862467Z ^^^^^^^ 2025-09-07T07:34:42.5862629Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5862704Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5862757Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5862952Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5863000Z return func(*args, **kwargs) 2025-09-07T07:34:42.5863042Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5863255Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5863356Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5863359Z 2025-09-07T07:34:42.5864758Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5864761Z 2025-09-07T07:34:42.5864763Z 2025-09-07T07:34:42.5864848Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5865099Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.5865102Z 2025-09-07T07:34:42.5865204Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5865290Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5865332Z inline_call [] 2025-09-07T07:34:42.5865395Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5865436Z inductor [] 2025-09-07T07:34:42.5865524Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5865607Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5865912Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5866049Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5866110Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5866289Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5866390Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5866656Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5866798Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5866923Z _ WhileLoopTests.test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.5866974Z Traceback (most recent call last): 2025-09-07T07:34:42.5867148Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5867223Z self._run_test( 2025-09-07T07:34:42.5868517Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5868583Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5868630Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5868786Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5868840Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5868913Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5869093Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5869148Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5869192Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5869354Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5869410Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5869454Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5869622Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5869717Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5869763Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5869945Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5869998Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5870177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5870240Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5870327Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5870495Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5871697Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5871743Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5871883Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5871959Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5872011Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5872161Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5872234Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5872284Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5872449Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5872503Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5872546Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5872708Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5872754Z return aot_autograd( 2025-09-07T07:34:42.5872816Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5872978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5873059Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5873111Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5873301Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5873401Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5873472Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5873687Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5873737Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5875110Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5875159Z fx_g = _create_graph( 2025-09-07T07:34:42.5875199Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5875391Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5875432Z fx_g = make_fx( 2025-09-07T07:34:42.5875470Z ^^^^^^^^ 2025-09-07T07:34:42.5875652Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5875709Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5875753Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5875925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5875975Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5876019Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5876207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5876251Z t = dispatch_trace( 2025-09-07T07:34:42.5876292Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5876425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5876475Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5876619Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5876767Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5876814Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5876855Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5877044Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5878290Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5878341Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5878489Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5878534Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5878574Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5878722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5878775Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5878814Z ^^^^^^^^^ 2025-09-07T07:34:42.5878970Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5879017Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5879059Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5879265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5879327Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5879366Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5879549Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5879623Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5879674Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5879906Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5879952Z outs_pair = fn(*args) 2025-09-07T07:34:42.5879993Z ^^^^^^^^^ 2025-09-07T07:34:42.5880266Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5881494Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5881571Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5881777Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5881821Z outs_pair = fn(*args) 2025-09-07T07:34:42.5881861Z ^^^^^^^^^ 2025-09-07T07:34:42.5882070Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5882145Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5882195Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5882425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5882509Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5882563Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5882769Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5882814Z outs_pair = fn(*args) 2025-09-07T07:34:42.5882853Z ^^^^^^^^^ 2025-09-07T07:34:42.5883078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5883153Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5883197Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5883396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5883450Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5883493Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5883643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5883692Z return handle_torch_function( 2025-09-07T07:34:42.5884879Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5885047Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5885135Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5885192Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5885390Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5885437Z return func(*args, **kwargs) 2025-09-07T07:34:42.5885479Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5885646Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5885696Z result = _engine_run_backward( 2025-09-07T07:34:42.5885740Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5885911Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5886053Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5886110Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5886278Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5886326Z return user_fn(self, *args) 2025-09-07T07:34:42.5886369Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5886608Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5886662Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5886705Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5886921Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5886973Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5888170Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5888319Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5888369Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5888410Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5888606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5888666Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5888712Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5888874Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5888934Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5888979Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5889170Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5889225Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5889271Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5889495Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5889538Z t = dispatch_trace( 2025-09-07T07:34:42.5889579Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5889712Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5889764Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5889805Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5889953Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5889998Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5890039Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5891363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5891455Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5891506Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5891653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5891698Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5891738Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5891913Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5891963Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5892004Z ^^^^^^^^^ 2025-09-07T07:34:42.5892182Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5892238Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5892278Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5892326Z File "", line 1, in 2025-09-07T07:34:42.5892498Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5892609Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5892663Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5892823Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5893770Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5893838Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5894069Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5895493Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5895537Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5895742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5895795Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5895838Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5896007Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5896060Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5896102Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5896273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5896377Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5896430Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5896658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5896768Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5896820Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5896970Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5897017Z leaves = list(leaves) 2025-09-07T07:34:42.5897057Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5897204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5897246Z return func(x) 2025-09-07T07:34:42.5897287Z ^^^^^^^ 2025-09-07T07:34:42.5897450Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5897528Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5897577Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5897778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5897832Z return func(*args, **kwargs) 2025-09-07T07:34:42.5897876Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5898092Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5898194Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5898222Z 2025-09-07T07:34:42.5899680Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5899686Z 2025-09-07T07:34:42.5899689Z 2025-09-07T07:34:42.5899775Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5900003Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.5900009Z 2025-09-07T07:34:42.5900109Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5900197Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5900238Z inline_call [] 2025-09-07T07:34:42.5900303Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5900343Z inductor [] 2025-09-07T07:34:42.5900435Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5900606Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5900914Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5901049Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5901108Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5901289Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5901390Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5901546Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5901689Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5901775Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5901815Z inline_call [] 2025-09-07T07:34:42.5901878Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5901961Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5903205Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5903551Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5903685Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5903743Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5903922Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5904023Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5904177Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5904317Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5904376Z =================================== FAILURES =================================== 2025-09-07T07:34:42.5904504Z _ WhileLoopTests.test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.5904555Z Traceback (most recent call last): 2025-09-07T07:34:42.5904729Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1285, in test_while_loop_with_parameters 2025-09-07T07:34:42.5904771Z self._run_test( 2025-09-07T07:34:42.5904921Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5904987Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5905036Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5905194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5905248Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5905294Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5905472Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5905529Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5905574Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5906960Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5907014Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5907059Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5907286Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5907381Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5907426Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5907605Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5907663Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5907839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5907902Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5907949Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5908120Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5908182Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5908229Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5908366Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5908444Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5908495Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5908645Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5908741Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5908790Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5908958Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5910162Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5910210Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5910376Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5910422Z return aot_autograd( 2025-09-07T07:34:42.5910464Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5910626Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5910707Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5910763Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5910956Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5911053Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5911136Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5911356Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5911407Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5911626Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5911672Z fx_g = _create_graph( 2025-09-07T07:34:42.5911714Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5911910Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5911949Z fx_g = make_fx( 2025-09-07T07:34:42.5911987Z ^^^^^^^^ 2025-09-07T07:34:42.5912168Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5912221Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5912290Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5913620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5913673Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5913715Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5913904Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5913949Z t = dispatch_trace( 2025-09-07T07:34:42.5913990Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5914122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5914171Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5914212Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5914359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5914410Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5914454Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5914645Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5914737Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5914785Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5914933Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5914997Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5915040Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5915189Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5915237Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5915278Z ^^^^^^^^^ 2025-09-07T07:34:42.5916636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5916689Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5916730Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5916911Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5916969Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5917010Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5917199Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5917272Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5917323Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5917531Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5917615Z outs_pair = fn(*args) 2025-09-07T07:34:42.5917658Z ^^^^^^^^^ 2025-09-07T07:34:42.5917862Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5917941Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5917993Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5918198Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5918246Z outs_pair = fn(*args) 2025-09-07T07:34:42.5918287Z ^^^^^^^^^ 2025-09-07T07:34:42.5918497Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5918567Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5918617Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5918896Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5920126Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5920245Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5920451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5920498Z outs_pair = fn(*args) 2025-09-07T07:34:42.5920538Z ^^^^^^^^^ 2025-09-07T07:34:42.5920763Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5920816Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5920861Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5921068Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5921122Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5921164Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5921312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5921362Z return handle_torch_function( 2025-09-07T07:34:42.5921438Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5921605Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5921694Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5921748Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5921945Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5921996Z return func(*args, **kwargs) 2025-09-07T07:34:42.5922038Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5922184Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5922232Z result = _engine_run_backward( 2025-09-07T07:34:42.5923424Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5923597Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5923743Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5923801Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5923950Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5924023Z return user_fn(self, *args) 2025-09-07T07:34:42.5924068Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5924238Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5924289Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5924332Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5924517Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5924573Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5924616Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5924761Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5924808Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5924849Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5925044Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5925139Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5925186Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5925347Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5925404Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5926666Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5926863Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5926918Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5926963Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5927154Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5927203Z t = dispatch_trace( 2025-09-07T07:34:42.5927244Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5927379Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5927429Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5927471Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5927617Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5927694Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5927735Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5927925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5928017Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5928064Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5928212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5928258Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5928299Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5928448Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5928496Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5928536Z ^^^^^^^^^ 2025-09-07T07:34:42.5929862Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5929923Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5929963Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5930011Z File "", line 1, in 2025-09-07T07:34:42.5930180Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5930299Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5930356Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5930518Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5930573Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5930618Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5930843Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5930896Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5930937Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5931138Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5931189Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5931234Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5931449Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5931501Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5931542Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5931700Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5931804Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5933004Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5933153Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5933225Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5933276Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5933425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5933474Z leaves = list(leaves) 2025-09-07T07:34:42.5933515Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5933659Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5933701Z return func(x) 2025-09-07T07:34:42.5933738Z ^^^^^^^ 2025-09-07T07:34:42.5933900Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5933999Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5934047Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5934244Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5934291Z return func(*args, **kwargs) 2025-09-07T07:34:42.5934335Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5934551Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5934652Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5934655Z 2025-09-07T07:34:42.5934897Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5934903Z 2025-09-07T07:34:42.5934905Z 2025-09-07T07:34:42.5934989Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5935216Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.5935220Z 2025-09-07T07:34:42.5935319Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5936642Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5936687Z inline_call [] 2025-09-07T07:34:42.5936753Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5936793Z inductor [] 2025-09-07T07:34:42.5936879Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5936964Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5937271Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5937408Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5937468Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5937648Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5937773Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5937956Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5938096Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5938180Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5938220Z inline_call [] 2025-09-07T07:34:42.5938287Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5938372Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5938454Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5938754Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5938887Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5938945Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5940277Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5940378Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5940532Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5940707Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5940790Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5940830Z inline_call [] 2025-09-07T07:34:42.5940894Z stats [('calls_captured', 6), ('unique_graphs', 1)] 2025-09-07T07:34:42.5940980Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5941066Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5941368Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5941497Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 823, in forward 2025-09-07T07:34:42.5941558Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5941733Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5941830Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5941982Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5943680Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5943939Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-c253830ee1b04621.xml - 2025-09-07T07:34:42.5944007Z =========================== short test summary info ============================ 2025-09-07T07:34:42.5944431Z FAILED [0.8420s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5944531Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5944533Z 2025-09-07T07:34:42.5944776Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5944800Z 2025-09-07T07:34:42.5944802Z 2025-09-07T07:34:42.5944923Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5946335Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.5946338Z 2025-09-07T07:34:42.5946437Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5946598Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.5946681Z ================== 1 failed, 245 deselected, 2 rerun in 2.90s ================== 2025-09-07T07:34:42.5946722Z Got exit code 1 2025-09-07T07:34:42.5946867Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-09-07T07:34:42.5947372Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.5947422Z import pkg_resources 2025-09-07T07:34:42.5947618Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-7da728ca71127d03.xml 2025-09-07T07:34:42.5947682Z ============================= test session starts ============================== 2025-09-07T07:34:42.5947850Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.5947896Z cachedir: .pytest_cache 2025-09-07T07:34:42.5948080Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.5948132Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.5948178Z configfile: pytest.ini 2025-09-07T07:34:42.5948370Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.5948461Z collecting ... collected 467 items / 68 deselected / 399 selected 2025-09-07T07:34:42.5948520Z stepcurrent: skipping 68 already run items. 2025-09-07T07:34:42.5948570Z Running 178 items in this shard 2025-09-07T07:34:42.5948572Z 2025-09-07T07:34:42.5948808Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.9909s] [ 0%] 2025-09-07T07:34:42.5949040Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.7827s] [ 0%] 2025-09-07T07:34:42.5950392Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True FAILED [0.7783s] [ 0%] 2025-09-07T07:34:42.5950397Z 2025-09-07T07:34:42.5950483Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.5950616Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.5950669Z Traceback (most recent call last): 2025-09-07T07:34:42.5950850Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.5950892Z self._run_test( 2025-09-07T07:34:42.5951025Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5951094Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5951142Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5951302Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5951357Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5951403Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5951629Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5951685Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5951729Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5951890Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5951942Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5951987Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5952155Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5952250Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5952295Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5953611Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5953669Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5953847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5953909Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5953956Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5954121Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5954202Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5954247Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5954385Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5954461Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5954512Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5954667Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5954740Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5954788Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5954951Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5955004Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5955051Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5955214Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5955260Z return aot_autograd( 2025-09-07T07:34:42.5955302Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5955461Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5955559Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5956830Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5957027Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5957123Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5957176Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5957393Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5957444Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5957661Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5957707Z fx_g = _create_graph( 2025-09-07T07:34:42.5957781Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5957994Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5958035Z fx_g = make_fx( 2025-09-07T07:34:42.5958074Z ^^^^^^^^ 2025-09-07T07:34:42.5958253Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5958308Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5958352Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5958524Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5958575Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5958617Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5958806Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5958853Z t = dispatch_trace( 2025-09-07T07:34:42.5958894Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5959029Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5960289Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5960332Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5960481Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5960557Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5960599Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5960789Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5960882Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5960930Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5961079Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5961128Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5961169Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5961316Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5961365Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5961405Z ^^^^^^^^^ 2025-09-07T07:34:42.5961560Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5961608Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5961650Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5961824Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5961882Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5961943Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5962131Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5963344Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5963399Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5963605Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5963654Z outs_pair = fn(*args) 2025-09-07T07:34:42.5963694Z ^^^^^^^^^ 2025-09-07T07:34:42.5963896Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5963974Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5964024Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5964249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5964322Z outs_pair = fn(*args) 2025-09-07T07:34:42.5964363Z ^^^^^^^^^ 2025-09-07T07:34:42.5964570Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5964642Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5964694Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5964921Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5965003Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5965058Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5965261Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5965307Z outs_pair = fn(*args) 2025-09-07T07:34:42.5965346Z ^^^^^^^^^ 2025-09-07T07:34:42.5965569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5965620Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5966972Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5967211Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5967265Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5967308Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5967457Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5967507Z return handle_torch_function( 2025-09-07T07:34:42.5967552Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5967718Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5967806Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.5967858Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5968056Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5968106Z return func(*args, **kwargs) 2025-09-07T07:34:42.5968149Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5968293Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.5968341Z result = _engine_run_backward( 2025-09-07T07:34:42.5968384Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5968579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.5968722Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5968778Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5968926Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.5968976Z return user_fn(self, *args) 2025-09-07T07:34:42.5970166Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5970336Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.5970387Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.5970430Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5970616Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.5970721Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.5970765Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5970909Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5970955Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5970995Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5971185Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.5971246Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.5971292Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5971451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.5971508Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.5971555Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5971746Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.5971801Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.5971847Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5972030Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5972096Z t = dispatch_trace( 2025-09-07T07:34:42.5972136Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5973383Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5973434Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5973475Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5973620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5973667Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5973709Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5973896Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5973987Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5974033Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5974176Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5974223Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5974263Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5974407Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5974456Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5974494Z ^^^^^^^^^ 2025-09-07T07:34:42.5974697Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5974756Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5974795Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5974843Z File "", line 1, in 2025-09-07T07:34:42.5975009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.5975098Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.5976375Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5976606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.5976662Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.5976705Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5976961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5977033Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5977075Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5977272Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.5977323Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.5977365Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5977529Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.5977578Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.5977619Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5977776Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.5977880Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.5977939Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5978082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.5978152Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.5978202Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5978348Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.5978418Z leaves = list(leaves) 2025-09-07T07:34:42.5978458Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.5979816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.5979860Z return func(x) 2025-09-07T07:34:42.5979897Z ^^^^^^^ 2025-09-07T07:34:42.5980058Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.5980138Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.5980186Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5980377Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.5980424Z return func(*args, **kwargs) 2025-09-07T07:34:42.5980465Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5980675Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.5980772Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.5980775Z 2025-09-07T07:34:42.5981041Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.5981045Z 2025-09-07T07:34:42.5981047Z 2025-09-07T07:34:42.5981132Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.5981359Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.5981362Z 2025-09-07T07:34:42.5981459Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.5981545Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.5981586Z inline_call [] 2025-09-07T07:34:42.5981652Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.5981691Z inductor [] 2025-09-07T07:34:42.5981777Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.5981859Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.5983304Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.5983459Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.5983517Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.5983692Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.5983791Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.5983941Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.5984078Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.5984204Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.5984257Z Traceback (most recent call last): 2025-09-07T07:34:42.5984430Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.5984471Z self._run_test( 2025-09-07T07:34:42.5984599Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.5984662Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.5984708Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5984883Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.5984936Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.5984980Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5985148Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.5985199Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.5985245Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5985395Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.5985444Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.5986644Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5986805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.5986897Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.5986941Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5987107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.5987157Z raise BackendCompilerFailed( 2025-09-07T07:34:42.5987356Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.5987420Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5987464Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5987620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.5987675Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.5987718Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5987848Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.5987920Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.5987968Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5988107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.5988178Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.5988278Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5988432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.5988480Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.5988521Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5989753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.5989799Z return aot_autograd( 2025-09-07T07:34:42.5989838Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.5989988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.5990064Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.5990114Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5990298Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.5990389Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.5990440Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5990642Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.5990717Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.5990923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.5990966Z fx_g = _create_graph( 2025-09-07T07:34:42.5991005Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5991187Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.5991227Z fx_g = make_fx( 2025-09-07T07:34:42.5991264Z ^^^^^^^^ 2025-09-07T07:34:42.5991432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.5991481Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.5991524Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5991685Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.5991734Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.5992839Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5993016Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.5993057Z t = dispatch_trace( 2025-09-07T07:34:42.5993095Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5993241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.5993293Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.5993332Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5993470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5993514Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5993553Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5993731Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.5993820Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.5993866Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5994005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.5994047Z return fn(*args, **kwargs) 2025-09-07T07:34:42.5994086Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5994263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.5994308Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.5994346Z ^^^^^^^^^ 2025-09-07T07:34:42.5994491Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.5994535Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.5994575Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5995780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.5995832Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.5995867Z ^^^^^^^^^^^ 2025-09-07T07:34:42.5996027Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.5996096Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.5996144Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5996326Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5996365Z outs_pair = fn(*args) 2025-09-07T07:34:42.5996401Z ^^^^^^^^^ 2025-09-07T07:34:42.5996657Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.5996764Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.5996809Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5996986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5997026Z outs_pair = fn(*args) 2025-09-07T07:34:42.5997061Z ^^^^^^^^^ 2025-09-07T07:34:42.5997245Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.5997305Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.5997348Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5997546Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.5997619Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.5997665Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5998843Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.5998884Z outs_pair = fn(*args) 2025-09-07T07:34:42.5998961Z ^^^^^^^^^ 2025-09-07T07:34:42.5999157Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.5999203Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.5999240Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5999412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.5999461Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.5999499Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5999627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.5999671Z return handle_torch_function( 2025-09-07T07:34:42.5999708Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.5999853Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.5999972Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6000019Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6000234Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6000277Z return func(*args, **kwargs) 2025-09-07T07:34:42.6000313Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6000441Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6000483Z result = _engine_run_backward( 2025-09-07T07:34:42.6000520Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6000668Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6001797Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6001853Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6001983Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6002025Z return user_fn(self, *args) 2025-09-07T07:34:42.6002063Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6002209Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6002276Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6002313Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6002473Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6002519Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6002557Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6002686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6002726Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6002763Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6002931Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6002985Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6003028Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6003167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6003216Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6003255Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6003434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6004463Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6004504Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6004666Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6004704Z t = dispatch_trace( 2025-09-07T07:34:42.6004740Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6004856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6004903Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6004939Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6005063Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6005101Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6005136Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6005317Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6005417Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6005457Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6005581Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6005619Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6005653Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6005783Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6005824Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6005858Z ^^^^^^^^^ 2025-09-07T07:34:42.6006009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6006058Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6007169Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6007216Z File "", line 1, in 2025-09-07T07:34:42.6007359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6007438Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6007483Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6007618Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6007709Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6007750Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6007946Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6007998Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6009066Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6009245Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6009288Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6009325Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6009467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6009512Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6009547Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6009680Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6009768Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6009814Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6009962Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6010995Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6011041Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6011168Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6011207Z leaves = list(leaves) 2025-09-07T07:34:42.6011240Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6011366Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6011401Z return func(x) 2025-09-07T07:34:42.6011434Z ^^^^^^^ 2025-09-07T07:34:42.6011573Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6011638Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6011703Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6011891Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6011932Z return func(*args, **kwargs) 2025-09-07T07:34:42.6011967Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6012149Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6012234Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6012237Z 2025-09-07T07:34:42.6012444Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6012447Z 2025-09-07T07:34:42.6012449Z 2025-09-07T07:34:42.6012522Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6012719Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.6012721Z 2025-09-07T07:34:42.6012806Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6012879Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6012915Z inline_call [] 2025-09-07T07:34:42.6013928Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6013988Z inductor [] 2025-09-07T07:34:42.6014062Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6014134Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6014391Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6014509Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6014561Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6014712Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6014797Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6014928Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6015050Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6015121Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6015155Z inline_call [] 2025-09-07T07:34:42.6015211Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6015300Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6015373Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6015624Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6015734Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6015784Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6015937Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6016021Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6017188Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6017310Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6017417Z =================================== FAILURES =================================== 2025-09-07T07:34:42.6017527Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.6017571Z Traceback (most recent call last): 2025-09-07T07:34:42.6017720Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6017758Z self._run_test( 2025-09-07T07:34:42.6017870Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6017925Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6017965Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6018097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6018143Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6018185Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6018337Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6018383Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6018422Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6018558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6018625Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6018662Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6018805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6018886Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6018924Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6020051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6020098Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6020249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6020302Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6020342Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6020488Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6020537Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6020576Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6020693Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6020783Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6020831Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6020962Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6021029Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6021070Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6021208Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6021254Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6021291Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6021430Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6021468Z return aot_autograd( 2025-09-07T07:34:42.6021503Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6021641Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6022726Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6022774Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6022935Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6023019Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6023064Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6023247Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6023289Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6023476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6023519Z fx_g = _create_graph( 2025-09-07T07:34:42.6023554Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6023717Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6023752Z fx_g = make_fx( 2025-09-07T07:34:42.6023784Z ^^^^^^^^ 2025-09-07T07:34:42.6023937Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6024000Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6024039Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6024184Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6024227Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6024263Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6024424Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6024460Z t = dispatch_trace( 2025-09-07T07:34:42.6024495Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6025574Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6025621Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6025656Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6025784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6025823Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6025859Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6026021Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6026133Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6026175Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6026300Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6026339Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6026373Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6026570Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6026611Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6026647Z ^^^^^^^^^ 2025-09-07T07:34:42.6026783Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6026824Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6026858Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6027008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6027086Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6027140Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6027299Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6028342Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6028387Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6028563Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6028603Z outs_pair = fn(*args) 2025-09-07T07:34:42.6028638Z ^^^^^^^^^ 2025-09-07T07:34:42.6028810Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6028877Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6028926Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6029102Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6029140Z outs_pair = fn(*args) 2025-09-07T07:34:42.6029177Z ^^^^^^^^^ 2025-09-07T07:34:42.6029354Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6029439Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6029481Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6029676Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6029746Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6029793Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6029968Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6030006Z outs_pair = fn(*args) 2025-09-07T07:34:42.6030040Z ^^^^^^^^^ 2025-09-07T07:34:42.6030230Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6031237Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6031275Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6031444Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6031489Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6031527Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6031674Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6031721Z return handle_torch_function( 2025-09-07T07:34:42.6031757Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6031901Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6031975Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6032020Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6032190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6032231Z return func(*args, **kwargs) 2025-09-07T07:34:42.6032267Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6032390Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6032432Z result = _engine_run_backward( 2025-09-07T07:34:42.6032489Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6032648Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6032770Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6032821Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6032948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6032989Z return user_fn(self, *args) 2025-09-07T07:34:42.6033991Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6034137Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6034182Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6034219Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6034382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6034425Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6034462Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6034585Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6034624Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6034683Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6034847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6034899Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6034939Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6035078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6035127Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6035167Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6035332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6035380Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6035418Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6035577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6035614Z t = dispatch_trace( 2025-09-07T07:34:42.6036682Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6036797Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6036840Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6036876Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6037037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6037077Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6037112Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6037272Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6037350Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6037392Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6037516Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6037553Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6037587Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6037713Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6037755Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6037819Z ^^^^^^^^^ 2025-09-07T07:34:42.6037988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6038039Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6038072Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6038114Z File "", line 1, in 2025-09-07T07:34:42.6038257Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6038340Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6039361Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6039499Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6039546Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6039586Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6039781Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6039825Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6039860Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6040031Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6040099Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6040137Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6040334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6040377Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6040412Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6040549Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6040641Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6040688Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6040814Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6040873Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6040919Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6041045Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6041083Z leaves = list(leaves) 2025-09-07T07:34:42.6041118Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6042212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6042269Z return func(x) 2025-09-07T07:34:42.6042305Z ^^^^^^^ 2025-09-07T07:34:42.6042445Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6042510Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6042550Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6042719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6042762Z return func(*args, **kwargs) 2025-09-07T07:34:42.6042797Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6042980Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6043066Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6043068Z 2025-09-07T07:34:42.6043296Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6043315Z 2025-09-07T07:34:42.6043317Z 2025-09-07T07:34:42.6043391Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6043587Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.6043589Z 2025-09-07T07:34:42.6043677Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6043750Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6043786Z inline_call [] 2025-09-07T07:34:42.6043842Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6043876Z inductor [] 2025-09-07T07:34:42.6043949Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6044022Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6045246Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6045359Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6045410Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6045583Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6045669Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6045799Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6045920Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6045994Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6046028Z inline_call [] 2025-09-07T07:34:42.6046084Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6046156Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6046226Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6046537Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6046650Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6046701Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6046852Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6046960Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6047091Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6047210Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6047281Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6047315Z inline_call [] 2025-09-07T07:34:42.6048363Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6048436Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6048508Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6048762Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6048918Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6048968Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6049117Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6049201Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6049330Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6049453Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6049669Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-7da728ca71127d03.xml - 2025-09-07T07:34:42.6049729Z =========================== short test summary info ============================ 2025-09-07T07:34:42.6050099Z FAILED [0.7783s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6050183Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6050186Z 2025-09-07T07:34:42.6050394Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6050417Z 2025-09-07T07:34:42.6050419Z 2025-09-07T07:34:42.6050490Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6050685Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.6050688Z 2025-09-07T07:34:42.6050773Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6050834Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.6050897Z ================== 1 failed, 68 deselected, 2 rerun in 2.73s =================== 2025-09-07T07:34:42.6050933Z Got exit code 1 2025-09-07T07:34:42.6050971Z Retrying single test... 2025-09-07T07:34:42.6052372Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.6052415Z import pkg_resources 2025-09-07T07:34:42.6052585Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-649d975a76d6be6e.xml 2025-09-07T07:34:42.6052658Z ============================= test session starts ============================== 2025-09-07T07:34:42.6052775Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.6052814Z cachedir: .pytest_cache 2025-09-07T07:34:42.6052970Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.6053015Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.6053053Z configfile: pytest.ini 2025-09-07T07:34:42.6053215Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.6053292Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.6053525Z stepcurrent: skipping 68 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.6053587Z Running 1 items in this shard 2025-09-07T07:34:42.6053590Z 2025-09-07T07:34:42.6053799Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.9794s] [100%] 2025-09-07T07:34:42.6053993Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.7815s] [100%] 2025-09-07T07:34:42.6054162Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True FAILED [0.8066s] [100%] 2025-09-07T07:34:42.6054165Z 2025-09-07T07:34:42.6054214Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.6054324Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.6054368Z Traceback (most recent call last): 2025-09-07T07:34:42.6054523Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6054562Z self._run_test( 2025-09-07T07:34:42.6055840Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6055898Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6055938Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6056074Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6056141Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6056180Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6056331Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6056378Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6056418Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6056646Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6056693Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6056731Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6056873Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6056955Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6056997Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6057150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6057196Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6057346Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6057433Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6057474Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6057620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6057670Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6058694Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6058811Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6058879Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6058923Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6059050Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6059112Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6059155Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6059343Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6059389Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6059426Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6059566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6059604Z return aot_autograd( 2025-09-07T07:34:42.6059642Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6059779Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6059849Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6059894Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6060057Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6060143Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6060188Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6060372Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6060415Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6061571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6061652Z fx_g = _create_graph( 2025-09-07T07:34:42.6061687Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6061856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6061890Z fx_g = make_fx( 2025-09-07T07:34:42.6061924Z ^^^^^^^^ 2025-09-07T07:34:42.6062082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6062129Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6062167Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6062313Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6062356Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6062394Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6062552Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6062589Z t = dispatch_trace( 2025-09-07T07:34:42.6062624Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6062737Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6062797Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6062835Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6062964Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6063003Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6063040Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6063201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6064248Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6064290Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6064417Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6064454Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6064489Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6064617Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6064694Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6064729Z ^^^^^^^^^ 2025-09-07T07:34:42.6064863Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6064902Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6064937Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6065087Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6065138Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6065173Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6065329Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6065392Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6065437Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6065616Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6065654Z outs_pair = fn(*args) 2025-09-07T07:34:42.6065689Z ^^^^^^^^^ 2025-09-07T07:34:42.6065861Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6066981Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6067026Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6067203Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6067241Z outs_pair = fn(*args) 2025-09-07T07:34:42.6067276Z ^^^^^^^^^ 2025-09-07T07:34:42.6067456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6067518Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6067560Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6067755Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6067824Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6067872Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6068045Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6068084Z outs_pair = fn(*args) 2025-09-07T07:34:42.6068117Z ^^^^^^^^^ 2025-09-07T07:34:42.6068331Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6068379Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6068417Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6068586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6068632Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6068668Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6068796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6068839Z return handle_torch_function( 2025-09-07T07:34:42.6069846Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6069990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6070066Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6070161Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6070332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6070374Z return func(*args, **kwargs) 2025-09-07T07:34:42.6070410Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6070534Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6070577Z result = _engine_run_backward( 2025-09-07T07:34:42.6070613Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6070759Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6070881Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6070931Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6071061Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6071102Z return user_fn(self, *args) 2025-09-07T07:34:42.6071139Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6071285Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6071350Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6071386Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6071544Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6071587Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6072591Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6072718Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6072760Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6072796Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6072963Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6073015Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6073055Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6073191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6073243Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6073281Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6073442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6073489Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6073546Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6073709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6073747Z t = dispatch_trace( 2025-09-07T07:34:42.6073781Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6073895Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6073939Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6073976Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6074100Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6074139Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6074174Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6075296Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6075397Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6075451Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6075577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6075615Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6075649Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6075778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6075820Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6075853Z ^^^^^^^^^ 2025-09-07T07:34:42.6076004Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6076052Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6076086Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6076128Z File "", line 1, in 2025-09-07T07:34:42.6076274Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6076353Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6076397Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6076621Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6076703Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6076742Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6076933Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6076976Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6077012Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6078173Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6078219Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6078257Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6078399Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6078442Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6078477Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6078614Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6078702Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6078748Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6078898Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6078963Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6079007Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6079134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6079171Z leaves = list(leaves) 2025-09-07T07:34:42.6079206Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6079329Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6079365Z return func(x) 2025-09-07T07:34:42.6079398Z ^^^^^^^ 2025-09-07T07:34:42.6079535Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6079600Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6079640Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6080825Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6080913Z return func(*args, **kwargs) 2025-09-07T07:34:42.6080949Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6081130Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6081216Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6081219Z 2025-09-07T07:34:42.6081425Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6081427Z 2025-09-07T07:34:42.6081429Z 2025-09-07T07:34:42.6081501Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6081699Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.6081704Z 2025-09-07T07:34:42.6081790Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6081865Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6081900Z inline_call [] 2025-09-07T07:34:42.6081958Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6081992Z inductor [] 2025-09-07T07:34:42.6082066Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6082158Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6082415Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6082528Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6082583Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6082736Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6082821Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6082954Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6084055Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6084169Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.6084213Z Traceback (most recent call last): 2025-09-07T07:34:42.6084363Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6084398Z self._run_test( 2025-09-07T07:34:42.6084527Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6084585Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6084626Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6084758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6084804Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6084842Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6084994Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6085040Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6085080Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6085216Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6085261Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6085316Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6085479Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6085559Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6085598Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6085749Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6085796Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6086981Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6087038Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6087078Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6087224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6087278Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6087318Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6087434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6087499Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6087543Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6087698Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6087762Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6087803Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6087944Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6087990Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6088029Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6088168Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6088209Z return aot_autograd( 2025-09-07T07:34:42.6088243Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6088380Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6088448Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6088494Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6088654Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6089709Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6089776Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6089964Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6090006Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6090193Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6090232Z fx_g = _create_graph( 2025-09-07T07:34:42.6090270Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6090434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6090469Z fx_g = make_fx( 2025-09-07T07:34:42.6090501Z ^^^^^^^^ 2025-09-07T07:34:42.6090653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6090700Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6090779Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6090926Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6090970Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6091006Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6091165Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6091204Z t = dispatch_trace( 2025-09-07T07:34:42.6091238Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6091353Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6091394Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6092491Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6092618Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6092663Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6092698Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6092860Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6092937Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6092979Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6093124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6093163Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6093197Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6093323Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6093364Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6093402Z ^^^^^^^^^ 2025-09-07T07:34:42.6093535Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6093576Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6093610Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6093758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6093808Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6093842Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6093999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6094059Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6094104Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6095259Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6095300Z outs_pair = fn(*args) 2025-09-07T07:34:42.6095337Z ^^^^^^^^^ 2025-09-07T07:34:42.6095509Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6095574Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6095619Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6095794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6095832Z outs_pair = fn(*args) 2025-09-07T07:34:42.6095866Z ^^^^^^^^^ 2025-09-07T07:34:42.6096043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6096103Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6096162Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6096377Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6096449Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6096554Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6096728Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6096769Z outs_pair = fn(*args) 2025-09-07T07:34:42.6096803Z ^^^^^^^^^ 2025-09-07T07:34:42.6096992Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6097040Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6097079Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6098225Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6098272Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6098310Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6098438Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6098506Z return handle_torch_function( 2025-09-07T07:34:42.6098541Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6098683Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6098756Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6098801Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6098970Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6099013Z return func(*args, **kwargs) 2025-09-07T07:34:42.6099049Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6099172Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6099213Z result = _engine_run_backward( 2025-09-07T07:34:42.6099249Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6099396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6099516Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6099565Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6099709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6099752Z return user_fn(self, *args) 2025-09-07T07:34:42.6099789Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6099934Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6100946Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6100984Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6101142Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6101189Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6101225Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6101350Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6101389Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6101424Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6101614Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6101687Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6101726Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6101861Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6101910Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6101949Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6102110Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6102158Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6102197Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6102358Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6102398Z t = dispatch_trace( 2025-09-07T07:34:42.6102433Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6102547Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6103556Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6103594Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6103718Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6103777Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6103811Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6103971Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6104048Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6104089Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6104217Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6104256Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6104289Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6104417Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6104458Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6104492Z ^^^^^^^^^ 2025-09-07T07:34:42.6104643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6104693Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6104726Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6104768Z File "", line 1, in 2025-09-07T07:34:42.6104910Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6105006Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6105054Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6105191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6106218Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6106257Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6106448Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6106557Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6106593Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6106764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6106809Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6106873Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6107036Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6107079Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6107116Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6107249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6107341Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6107389Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6107516Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6107576Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6107620Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6107751Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6107790Z leaves = list(leaves) 2025-09-07T07:34:42.6107823Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6107946Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6107981Z return func(x) 2025-09-07T07:34:42.6109011Z ^^^^^^^ 2025-09-07T07:34:42.6109150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6109260Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6109300Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6109468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6109508Z return func(*args, **kwargs) 2025-09-07T07:34:42.6109548Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6109729Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6109815Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6109818Z 2025-09-07T07:34:42.6110023Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6110028Z 2025-09-07T07:34:42.6110030Z 2025-09-07T07:34:42.6110102Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6110297Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.6110299Z 2025-09-07T07:34:42.6110404Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6110482Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6110517Z inline_call [] 2025-09-07T07:34:42.6110574Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6110609Z inductor [] 2025-09-07T07:34:42.6110682Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6110754Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6111015Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6111127Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6112170Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6112324Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6112443Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6112577Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6112696Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6112767Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6112805Z inline_call [] 2025-09-07T07:34:42.6112862Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6112933Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6113003Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6113259Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6113374Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6113423Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6113573Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6113657Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6113804Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6113924Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6113973Z =================================== FAILURES =================================== 2025-09-07T07:34:42.6114083Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.6114129Z Traceback (most recent call last): 2025-09-07T07:34:42.6114280Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6114316Z self._run_test( 2025-09-07T07:34:42.6115433Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6115487Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6115528Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6115664Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6115711Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6115750Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6115900Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6115965Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6116009Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6116146Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6116190Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6116226Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6116369Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6116451Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6116616Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6116770Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6116817Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6116967Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6117074Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6117115Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6117257Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6118304Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6118346Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6118464Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6118530Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6118574Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6118701Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6118767Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6118814Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6118956Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6119000Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6119037Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6119174Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6119240Z return aot_autograd( 2025-09-07T07:34:42.6119276Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6119412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6119481Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6119528Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6119692Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6119775Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6119821Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6120006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6120049Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6121290Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6121330Z fx_g = _create_graph( 2025-09-07T07:34:42.6121368Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6121556Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6121593Z fx_g = make_fx( 2025-09-07T07:34:42.6121627Z ^^^^^^^^ 2025-09-07T07:34:42.6121780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6121825Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6121864Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6122008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6122054Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6122090Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6122249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6122287Z t = dispatch_trace( 2025-09-07T07:34:42.6122321Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6122436Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6122514Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6122553Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6122679Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6122719Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6122754Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6123886Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6123967Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6124010Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6124135Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6124174Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6124211Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6124342Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6124383Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6124417Z ^^^^^^^^^ 2025-09-07T07:34:42.6124549Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6124590Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6124648Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6124798Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6124846Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6124880Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6125037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6125105Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6125150Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6125325Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6125364Z outs_pair = fn(*args) 2025-09-07T07:34:42.6125398Z ^^^^^^^^^ 2025-09-07T07:34:42.6125571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6126689Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6126736Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6126909Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6126973Z outs_pair = fn(*args) 2025-09-07T07:34:42.6127008Z ^^^^^^^^^ 2025-09-07T07:34:42.6127190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6127250Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6127293Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6127486Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6127560Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6127606Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6127778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6127816Z outs_pair = fn(*args) 2025-09-07T07:34:42.6127875Z ^^^^^^^^^ 2025-09-07T07:34:42.6128083Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6128129Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6128165Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6128334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6128380Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6128417Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6128543Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6129574Z return handle_torch_function( 2025-09-07T07:34:42.6129611Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6129755Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6129832Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6129878Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6130046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6130088Z return func(*args, **kwargs) 2025-09-07T07:34:42.6130124Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6130277Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6130319Z result = _engine_run_backward( 2025-09-07T07:34:42.6130356Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6130501Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6130621Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6130673Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6130798Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6130841Z return user_fn(self, *args) 2025-09-07T07:34:42.6130877Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6131022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6131065Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6131103Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6131260Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6131304Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6132322Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6132450Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6132488Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6132524Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6132688Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6132740Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6132783Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6132920Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6132969Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6133008Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6133171Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6133252Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6133291Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6133450Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6133489Z t = dispatch_trace( 2025-09-07T07:34:42.6133523Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6133635Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6133679Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6133715Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6133839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6133878Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6133912Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6135042Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6135122Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6135163Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6135286Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6135325Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6135359Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6135509Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6135550Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6135585Z ^^^^^^^^^ 2025-09-07T07:34:42.6135734Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6135785Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6135819Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6135862Z File "", line 1, in 2025-09-07T07:34:42.6136007Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6136085Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6136131Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6136269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6136318Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6136356Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6136629Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6136699Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6137715Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6137889Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6137933Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6137970Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6138113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6138157Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6138193Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6138326Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6138414Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6138459Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6138630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6138690Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6138734Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6138859Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6138898Z leaves = list(leaves) 2025-09-07T07:34:42.6138934Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6139058Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6139093Z return func(x) 2025-09-07T07:34:42.6139126Z ^^^^^^^ 2025-09-07T07:34:42.6139265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6139330Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6139375Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6140514Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6140555Z return func(*args, **kwargs) 2025-09-07T07:34:42.6140592Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6140772Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6140886Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6140888Z 2025-09-07T07:34:42.6141094Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6141096Z 2025-09-07T07:34:42.6141098Z 2025-09-07T07:34:42.6141172Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6141370Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.6141373Z 2025-09-07T07:34:42.6141458Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6141532Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6141567Z inline_call [] 2025-09-07T07:34:42.6141623Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6141660Z inductor [] 2025-09-07T07:34:42.6141733Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6141804Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6142061Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6142191Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6142243Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6142397Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6142484Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6142615Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6143697Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6143770Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6143804Z inline_call [] 2025-09-07T07:34:42.6143861Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6143933Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6144039Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6144293Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6144404Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6144454Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6144607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6144692Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6144822Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6144941Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6145015Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6145049Z inline_call [] 2025-09-07T07:34:42.6145103Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6145176Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6145245Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6145520Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6145628Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6145678Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6145826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6146963Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6147093Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6147210Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6147426Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-649d975a76d6be6e.xml - 2025-09-07T07:34:42.6147487Z =========================== short test summary info ============================ 2025-09-07T07:34:42.6147875Z FAILED [0.8066s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6147963Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6147965Z 2025-09-07T07:34:42.6148172Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6148174Z 2025-09-07T07:34:42.6148176Z 2025-09-07T07:34:42.6148247Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6148442Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.6148444Z 2025-09-07T07:34:42.6148528Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6148588Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.6148656Z ================== 1 failed, 245 deselected, 2 rerun in 2.79s ================== 2025-09-07T07:34:42.6148723Z Got exit code 1 2025-09-07T07:34:42.6148782Z Retrying single test... 2025-09-07T07:34:42.6149210Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.6149248Z import pkg_resources 2025-09-07T07:34:42.6149420Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-cd74aa0771c0c80a.xml 2025-09-07T07:34:42.6149475Z ============================= test session starts ============================== 2025-09-07T07:34:42.6149589Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.6149627Z cachedir: .pytest_cache 2025-09-07T07:34:42.6150768Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.6150815Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.6150854Z configfile: pytest.ini 2025-09-07T07:34:42.6151017Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.6151094Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.6151348Z stepcurrent: skipping 68 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.6151392Z Running 1 items in this shard 2025-09-07T07:34:42.6151394Z 2025-09-07T07:34:42.6151590Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [1.0175s] [100%] 2025-09-07T07:34:42.6151786Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.7819s] [100%] 2025-09-07T07:34:42.6151957Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True FAILED [0.8014s] [100%] 2025-09-07T07:34:42.6151959Z 2025-09-07T07:34:42.6152009Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.6152119Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.6152166Z Traceback (most recent call last): 2025-09-07T07:34:42.6152319Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6152354Z self._run_test( 2025-09-07T07:34:42.6152469Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6152544Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6152588Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6152724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6152770Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6152808Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6153933Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6153983Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6154021Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6154159Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6154202Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6154239Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6154405Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6154503Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6154542Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6154693Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6154740Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6154891Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6154946Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6154986Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6155133Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6155185Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6155226Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6155343Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6155410Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6155453Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6155579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6155665Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6156756Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6156897Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6156942Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6156979Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6157122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6157161Z return aot_autograd( 2025-09-07T07:34:42.6157198Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6157333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6157403Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6157448Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6157611Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6157694Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6157738Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6157945Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6157990Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6158182Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6158222Z fx_g = _create_graph( 2025-09-07T07:34:42.6158258Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6158420Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6158456Z fx_g = make_fx( 2025-09-07T07:34:42.6158489Z ^^^^^^^^ 2025-09-07T07:34:42.6159623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6159672Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6159710Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6159903Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6159948Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6159984Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6160203Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6160242Z t = dispatch_trace( 2025-09-07T07:34:42.6160277Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6160391Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6160433Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6160469Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6160594Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6160634Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6160674Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6160836Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6160916Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6160956Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6161081Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6161140Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6161175Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6161305Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6162320Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6162356Z ^^^^^^^^^ 2025-09-07T07:34:42.6162491Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6162534Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6162570Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6162720Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6162769Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6162804Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6162960Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6163025Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6163068Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6163243Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6163282Z outs_pair = fn(*args) 2025-09-07T07:34:42.6163335Z ^^^^^^^^^ 2025-09-07T07:34:42.6163511Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6163578Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6163622Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6163795Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6163835Z outs_pair = fn(*args) 2025-09-07T07:34:42.6163869Z ^^^^^^^^^ 2025-09-07T07:34:42.6164046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6165078Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6165122Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6165359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6165430Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6165476Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6165648Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6165687Z outs_pair = fn(*args) 2025-09-07T07:34:42.6165721Z ^^^^^^^^^ 2025-09-07T07:34:42.6165911Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6165955Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6165991Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6166162Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6166208Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6166246Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6166371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6166413Z return handle_torch_function( 2025-09-07T07:34:42.6166449Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6166720Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6166795Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6166840Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6167007Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6167050Z return func(*args, **kwargs) 2025-09-07T07:34:42.6168067Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6168193Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6168234Z result = _engine_run_backward( 2025-09-07T07:34:42.6168270Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6168416Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6168540Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6168588Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6168715Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6168755Z return user_fn(self, *args) 2025-09-07T07:34:42.6168822Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6168970Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6169014Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6169050Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6169211Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6169255Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6169293Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6169418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6169457Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6169493Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6169658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6169732Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6169793Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6170902Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6170952Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6170991Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6171152Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6171200Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6171238Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6171397Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6171435Z t = dispatch_trace( 2025-09-07T07:34:42.6171471Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6171588Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6171632Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6171668Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6171792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6171830Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6171865Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6172055Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6172134Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6172174Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6172300Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6172341Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6172375Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6173467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6173511Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6173544Z ^^^^^^^^^ 2025-09-07T07:34:42.6173693Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6173745Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6173780Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6173822Z File "", line 1, in 2025-09-07T07:34:42.6173965Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6174043Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6174106Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6174246Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6174293Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6174331Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6174523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6174568Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6174603Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6174776Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6174819Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6174856Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6175000Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6175078Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6175115Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6176219Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6176309Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6176355Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6176547Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6176609Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6176651Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6176778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6176819Z leaves = list(leaves) 2025-09-07T07:34:42.6176855Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6176978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6177013Z return func(x) 2025-09-07T07:34:42.6177046Z ^^^^^^^ 2025-09-07T07:34:42.6177183Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6177273Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6177314Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6177481Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6177522Z return func(*args, **kwargs) 2025-09-07T07:34:42.6177558Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6177742Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6177832Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6177834Z 2025-09-07T07:34:42.6178039Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6178042Z 2025-09-07T07:34:42.6178043Z 2025-09-07T07:34:42.6179102Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6179298Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.6179301Z 2025-09-07T07:34:42.6179386Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6179461Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6179577Z inline_call [] 2025-09-07T07:34:42.6179651Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6179695Z inductor [] 2025-09-07T07:34:42.6179769Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6179841Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6180099Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6180214Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6180264Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6180417Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6180505Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6180681Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6180800Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6180911Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.6180954Z Traceback (most recent call last): 2025-09-07T07:34:42.6181104Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6181140Z self._run_test( 2025-09-07T07:34:42.6181253Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6181308Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6182319Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6182453Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6182503Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6182542Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6182694Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6182740Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6182779Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6182937Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6182989Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6183025Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6183167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6183248Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6183288Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6183442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6183487Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6183637Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6183691Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6183732Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6183872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6183923Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6183961Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6185053Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6185121Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6185166Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6185292Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6185355Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6185396Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6185540Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6185583Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6185620Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6185757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6185797Z return aot_autograd( 2025-09-07T07:34:42.6185852Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6186013Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6186082Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6186128Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6186287Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6186373Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6186417Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6186662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6186706Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6186897Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6186936Z fx_g = _create_graph( 2025-09-07T07:34:42.6187952Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6188117Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6188152Z fx_g = make_fx( 2025-09-07T07:34:42.6188212Z ^^^^^^^^ 2025-09-07T07:34:42.6188365Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6188411Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6188448Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6188595Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6188639Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6188678Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6188837Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6188875Z t = dispatch_trace( 2025-09-07T07:34:42.6188908Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6189022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6189066Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6189102Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6189226Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6189266Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6189300Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6189483Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6189563Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6190566Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6190692Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6190731Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6190765Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6190891Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6190935Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6190969Z ^^^^^^^^^ 2025-09-07T07:34:42.6191100Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6191141Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6191175Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6191345Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6191414Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6191447Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6191603Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6191664Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6191710Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6191884Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6191924Z outs_pair = fn(*args) 2025-09-07T07:34:42.6191958Z ^^^^^^^^^ 2025-09-07T07:34:42.6192132Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6192201Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6192245Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6193382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6193422Z outs_pair = fn(*args) 2025-09-07T07:34:42.6193455Z ^^^^^^^^^ 2025-09-07T07:34:42.6193634Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6193712Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6193755Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6193948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6194023Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6194070Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6194243Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6194281Z outs_pair = fn(*args) 2025-09-07T07:34:42.6194315Z ^^^^^^^^^ 2025-09-07T07:34:42.6194504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6194553Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6194588Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6194759Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6194817Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6194856Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6194982Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6195024Z return handle_torch_function( 2025-09-07T07:34:42.6195061Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6195202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6196333Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6196379Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6196608Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6196649Z return func(*args, **kwargs) 2025-09-07T07:34:42.6196685Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6196810Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6196904Z result = _engine_run_backward( 2025-09-07T07:34:42.6196940Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6197088Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6197207Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6197257Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6197382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6197424Z return user_fn(self, *args) 2025-09-07T07:34:42.6197459Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6197605Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6197650Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6197688Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6197850Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6197894Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6197929Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6198053Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6199095Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6199132Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6199297Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6199349Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6199389Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6199529Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6199577Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6199615Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6199777Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6199824Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6199866Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6200024Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6200062Z t = dispatch_trace( 2025-09-07T07:34:42.6200096Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6200276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6200340Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6200381Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6200504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6200543Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6200577Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6200737Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6200815Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6201829Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6201954Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6201992Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6202026Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6202153Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6202227Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6202262Z ^^^^^^^^^ 2025-09-07T07:34:42.6202411Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6202461Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6202494Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6202535Z File "", line 1, in 2025-09-07T07:34:42.6202680Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6202756Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6202801Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6202937Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6202988Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6203026Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6203219Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6203262Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6203298Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6203469Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6204488Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6204526Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6204670Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6204711Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6204752Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6204891Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6204980Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6205025Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6205151Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6205213Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6205257Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6205382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6205421Z leaves = list(leaves) 2025-09-07T07:34:42.6205454Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6205596Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6205634Z return func(x) 2025-09-07T07:34:42.6205667Z ^^^^^^^ 2025-09-07T07:34:42.6205806Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6205870Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6205912Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6206078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6206120Z return func(*args, **kwargs) 2025-09-07T07:34:42.6207179Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6207363Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6207449Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6207481Z 2025-09-07T07:34:42.6207704Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6207707Z 2025-09-07T07:34:42.6207709Z 2025-09-07T07:34:42.6207783Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6207979Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.6207982Z 2025-09-07T07:34:42.6208068Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6208142Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6208178Z inline_call [] 2025-09-07T07:34:42.6208234Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6208268Z inductor [] 2025-09-07T07:34:42.6208345Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6208418Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6208681Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6208794Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6208867Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6209019Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6209105Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6209237Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6209360Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6209430Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6210445Z inline_call [] 2025-09-07T07:34:42.6210502Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6210575Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6210645Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6210902Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6211012Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6211063Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6211234Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6211323Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6211453Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6211571Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6211620Z =================================== FAILURES =================================== 2025-09-07T07:34:42.6211732Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.6211776Z Traceback (most recent call last): 2025-09-07T07:34:42.6211927Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6211962Z self._run_test( 2025-09-07T07:34:42.6212075Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6212158Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6212200Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6212332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6212378Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6212417Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6213538Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6213586Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6213625Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6213763Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6213809Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6213849Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6213994Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6214075Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6214114Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6214266Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6214330Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6214481Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6214532Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6214573Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6214715Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6214768Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6214807Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6214923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6214988Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6215032Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6215159Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6216188Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6216231Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6216370Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6216438Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6216478Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6216690Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6216731Z return aot_autograd( 2025-09-07T07:34:42.6216766Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6216903Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6216974Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6217020Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6217179Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6217262Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6217310Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6217536Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6217579Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6217765Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6217804Z fx_g = _create_graph( 2025-09-07T07:34:42.6217841Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6218006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6218040Z fx_g = make_fx( 2025-09-07T07:34:42.6218073Z ^^^^^^^^ 2025-09-07T07:34:42.6219207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6219257Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6219296Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6219442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6219484Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6219521Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6219679Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6219743Z t = dispatch_trace( 2025-09-07T07:34:42.6219777Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6219890Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6219930Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6219966Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6220091Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6220133Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6220168Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6220331Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6220409Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6220449Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6220575Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6220613Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6220647Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6221734Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6221776Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6221833Z ^^^^^^^^^ 2025-09-07T07:34:42.6221969Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6222010Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6222044Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6222191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6222242Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6222278Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6222434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6222495Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6222539Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6222713Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6222768Z outs_pair = fn(*args) 2025-09-07T07:34:42.6222817Z ^^^^^^^^^ 2025-09-07T07:34:42.6222990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6223056Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6223100Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6223273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6223312Z outs_pair = fn(*args) 2025-09-07T07:34:42.6223345Z ^^^^^^^^^ 2025-09-07T07:34:42.6223525Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6224545Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6224591Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6224786Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6224857Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6224902Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6225076Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6225132Z outs_pair = fn(*args) 2025-09-07T07:34:42.6225167Z ^^^^^^^^^ 2025-09-07T07:34:42.6225359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6225407Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6225446Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6225617Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6225662Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6225699Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6225823Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6225867Z return handle_torch_function( 2025-09-07T07:34:42.6225904Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6226044Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6226119Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6226163Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6226345Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6226387Z return func(*args, **kwargs) 2025-09-07T07:34:42.6227457Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6227581Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6227624Z result = _engine_run_backward( 2025-09-07T07:34:42.6227659Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6227808Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6227929Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6227978Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6228104Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6228205Z return user_fn(self, *args) 2025-09-07T07:34:42.6228242Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6228387Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6228430Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6228466Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6228625Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6228672Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6228708Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6228832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6228871Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6228906Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6229074Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6229126Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6230134Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6230272Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6230321Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6230386Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6230547Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6230595Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6230635Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6230794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6230835Z t = dispatch_trace( 2025-09-07T07:34:42.6230870Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6230983Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6231025Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6231061Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6231184Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6231226Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6231260Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6231422Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6231499Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6231539Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6231681Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6231720Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6231754Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6232841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6232883Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6232918Z ^^^^^^^^^ 2025-09-07T07:34:42.6233069Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6233118Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6233151Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6233193Z File "", line 1, in 2025-09-07T07:34:42.6233336Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6233414Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6233491Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6233627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6233675Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6233712Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6233903Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6233949Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6233985Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6234156Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6234201Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6234239Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6234383Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6234425Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6235417Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6235551Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6235670Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6235715Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6235841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6235901Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6235945Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6236075Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6236114Z leaves = list(leaves) 2025-09-07T07:34:42.6236148Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6236271Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6236305Z return func(x) 2025-09-07T07:34:42.6236338Z ^^^^^^^ 2025-09-07T07:34:42.6236475Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6236616Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6236658Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6236825Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6236866Z return func(*args, **kwargs) 2025-09-07T07:34:42.6236926Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6237111Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6237196Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6237198Z 2025-09-07T07:34:42.6237405Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6237409Z 2025-09-07T07:34:42.6237411Z 2025-09-07T07:34:42.6238461Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6238655Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.6238657Z 2025-09-07T07:34:42.6238744Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6238867Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6238904Z inline_call [] 2025-09-07T07:34:42.6238960Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6238994Z inductor [] 2025-09-07T07:34:42.6239068Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6239140Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6239400Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6239510Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6239562Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6239713Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6239803Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6239933Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6240052Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6240123Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6240220Z inline_call [] 2025-09-07T07:34:42.6240275Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6240350Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6240419Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6241646Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6241761Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6241812Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6241962Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6242047Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6242178Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6242297Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6242366Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6242400Z inline_call [] 2025-09-07T07:34:42.6242474Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6242549Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6242619Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6242870Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6242977Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6243027Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6243177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6243264Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6243394Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6243544Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6243761Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-cd74aa0771c0c80a.xml - 2025-09-07T07:34:42.6243819Z =========================== short test summary info ============================ 2025-09-07T07:34:42.6244179Z FAILED [0.8014s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6245232Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6245234Z 2025-09-07T07:34:42.6245441Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6245447Z 2025-09-07T07:34:42.6245450Z 2025-09-07T07:34:42.6245522Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6245715Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.6245717Z 2025-09-07T07:34:42.6245800Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6245879Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.6245944Z ================== 1 failed, 245 deselected, 2 rerun in 2.77s ================== 2025-09-07T07:34:42.6245979Z Got exit code 1 2025-09-07T07:34:42.6246102Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-09-07T07:34:42.6246589Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.6246630Z import pkg_resources 2025-09-07T07:34:42.6246799Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-dbaf4ff3449f39d4.xml 2025-09-07T07:34:42.6246854Z ============================= test session starts ============================== 2025-09-07T07:34:42.6246970Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.6247009Z cachedir: .pytest_cache 2025-09-07T07:34:42.6247164Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.6247209Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.6247267Z configfile: pytest.ini 2025-09-07T07:34:42.6247430Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.6247505Z collecting ... collected 467 items / 69 deselected / 398 selected 2025-09-07T07:34:42.6247556Z stepcurrent: skipping 69 already run items. 2025-09-07T07:34:42.6248573Z Running 177 items in this shard 2025-09-07T07:34:42.6248576Z 2025-09-07T07:34:42.6248756Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_False PASSED [2.0838s] [ 0%] 2025-09-07T07:34:42.6248956Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.6441s] [ 1%] 2025-09-07T07:34:42.6249151Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.2861s] [ 1%] 2025-09-07T07:34:42.6249366Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True FAILED [0.2884s] [ 1%] 2025-09-07T07:34:42.6249368Z 2025-09-07T07:34:42.6249417Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.6249530Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.6249572Z Traceback (most recent call last): 2025-09-07T07:34:42.6249725Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6249762Z self._run_test( 2025-09-07T07:34:42.6249874Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6249930Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6249970Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6250107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6250156Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6250196Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6250347Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6250393Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6250431Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6250588Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6250631Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6251634Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6251778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6251861Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6251902Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6252057Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6252103Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6252254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6252307Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6252349Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6252492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6252543Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6252581Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6252715Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6252785Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6252831Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6252956Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6253020Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6253061Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6253201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6253245Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6253282Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6253420Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6254414Z return aot_autograd( 2025-09-07T07:34:42.6254474Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6254633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6254703Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6254748Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6254908Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6254992Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6255037Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6255218Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6255262Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6255454Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6255494Z fx_g = _create_graph( 2025-09-07T07:34:42.6255529Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6255695Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6255729Z fx_g = make_fx( 2025-09-07T07:34:42.6255778Z ^^^^^^^^ 2025-09-07T07:34:42.6255930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6255975Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6256013Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6256158Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6256202Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6257283Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6257445Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6257483Z t = dispatch_trace( 2025-09-07T07:34:42.6257517Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6257630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6257674Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6257708Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6257834Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6257874Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6257911Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6258095Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6258180Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6258220Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6258345Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6258382Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6258417Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6258543Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6258587Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6258621Z ^^^^^^^^^ 2025-09-07T07:34:42.6258754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6258794Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6258830Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6259966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6260038Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6260071Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6260228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6260290Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6260335Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6260510Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6260549Z outs_pair = fn(*args) 2025-09-07T07:34:42.6260584Z ^^^^^^^^^ 2025-09-07T07:34:42.6260756Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6260826Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6260870Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6261042Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6261082Z outs_pair = fn(*args) 2025-09-07T07:34:42.6261116Z ^^^^^^^^^ 2025-09-07T07:34:42.6261292Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6261373Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6261415Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6261612Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6261685Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6261732Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6261904Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6262909Z outs_pair = fn(*args) 2025-09-07T07:34:42.6262943Z ^^^^^^^^^ 2025-09-07T07:34:42.6263134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6263181Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6263218Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6263386Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6263449Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6263487Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6263615Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6263656Z return handle_torch_function( 2025-09-07T07:34:42.6263693Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6263833Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6263910Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6263957Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6264124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6264164Z return func(*args, **kwargs) 2025-09-07T07:34:42.6264200Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6264325Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6264398Z result = _engine_run_backward( 2025-09-07T07:34:42.6264435Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6264580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6265677Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6265730Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6265856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6265897Z return user_fn(self, *args) 2025-09-07T07:34:42.6265932Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6266076Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6266122Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6266159Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6266317Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6266360Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6266397Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6266588Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6266664Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6266698Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6266866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6266918Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6266958Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6267096Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6267146Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6267184Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6267345Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6267391Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6268408Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6268568Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6268607Z t = dispatch_trace( 2025-09-07T07:34:42.6268640Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6268754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6268824Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6268863Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6268987Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6269025Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6269060Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6269220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6269301Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6269341Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6269465Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6269503Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6269537Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6269664Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6269743Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6269777Z ^^^^^^^^^ 2025-09-07T07:34:42.6269927Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6269976Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6270971Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6271014Z File "", line 1, in 2025-09-07T07:34:42.6271160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6271237Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6271283Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6271419Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6271469Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6271508Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6271699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6271741Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6271777Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6271967Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6272011Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6272048Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6272190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6272232Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6272270Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6272405Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6272493Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6272539Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6272665Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6272727Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6273730Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6273857Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6273895Z leaves = list(leaves) 2025-09-07T07:34:42.6273930Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6274069Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6274107Z return func(x) 2025-09-07T07:34:42.6274140Z ^^^^^^^ 2025-09-07T07:34:42.6274278Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6274342Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6274383Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6274554Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6274595Z return func(*args, **kwargs) 2025-09-07T07:34:42.6274631Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6274811Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6274897Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6274920Z 2025-09-07T07:34:42.6275141Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6275144Z 2025-09-07T07:34:42.6275146Z 2025-09-07T07:34:42.6275218Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6275415Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.6275419Z 2025-09-07T07:34:42.6275503Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6275576Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6275611Z inline_call [] 2025-09-07T07:34:42.6276709Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6276788Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6276865Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6277126Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6277238Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6277318Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6277469Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6277555Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6277686Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6277806Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6277923Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.6277966Z Traceback (most recent call last): 2025-09-07T07:34:42.6278115Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6278150Z self._run_test( 2025-09-07T07:34:42.6278262Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6278317Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6278358Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6278490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6278537Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6278592Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6278745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6278792Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6279804Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6279944Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6279988Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6280028Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6280224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6280306Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6280343Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6280498Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6280584Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6280735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6280787Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6280828Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6280969Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6281021Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6281058Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6281176Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6281241Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6281286Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6281418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6281481Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6281522Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6282622Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6282684Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6282721Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6282858Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6282897Z return aot_autograd( 2025-09-07T07:34:42.6282932Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6283072Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6283144Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6283191Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6283350Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6283434Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6283478Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6283662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6283705Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6283890Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6283944Z fx_g = _create_graph( 2025-09-07T07:34:42.6283980Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6284145Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6284179Z fx_g = make_fx( 2025-09-07T07:34:42.6284213Z ^^^^^^^^ 2025-09-07T07:34:42.6284365Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6284413Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6285409Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6285558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6285600Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6285636Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6285795Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6285851Z t = dispatch_trace( 2025-09-07T07:34:42.6285899Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6286014Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6286055Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6286091Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6286215Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6286257Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6286292Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6286455Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6286618Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6286659Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6286790Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6286828Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6286863Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6286989Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6287030Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6287064Z ^^^^^^^^^ 2025-09-07T07:34:42.6288201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6288242Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6288278Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6288428Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6288479Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6288515Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6288675Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6288737Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6288781Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6288955Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6288997Z outs_pair = fn(*args) 2025-09-07T07:34:42.6289031Z ^^^^^^^^^ 2025-09-07T07:34:42.6289202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6289267Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6289312Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6289515Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6289554Z outs_pair = fn(*args) 2025-09-07T07:34:42.6289587Z ^^^^^^^^^ 2025-09-07T07:34:42.6289768Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6289827Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6289872Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6291034Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6291106Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6291151Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6291347Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6291404Z outs_pair = fn(*args) 2025-09-07T07:34:42.6291438Z ^^^^^^^^^ 2025-09-07T07:34:42.6291627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6291673Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6291711Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6291880Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6291926Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6291962Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6292089Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6292134Z return handle_torch_function( 2025-09-07T07:34:42.6292171Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6292312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6292387Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6292431Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6292598Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6292655Z return func(*args, **kwargs) 2025-09-07T07:34:42.6292692Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6292816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6293822Z result = _engine_run_backward( 2025-09-07T07:34:42.6293861Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6294012Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6294133Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6294182Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6294308Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6294353Z return user_fn(self, *args) 2025-09-07T07:34:42.6294388Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6294534Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6294576Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6294612Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6294790Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6294838Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6294874Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6294996Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6295036Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6295070Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6295238Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6295289Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6295329Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6295463Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6296466Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6296598Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6296786Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6296833Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6296873Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6297031Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6297070Z t = dispatch_trace( 2025-09-07T07:34:42.6297104Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6297219Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6297261Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6297297Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6297421Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6297462Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6297497Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6297659Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6297736Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6297777Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6297923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6297961Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6297995Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6298121Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6298163Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6299178Z ^^^^^^^^^ 2025-09-07T07:34:42.6299333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6299381Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6299415Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6299456Z File "", line 1, in 2025-09-07T07:34:42.6299600Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6299677Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6299725Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6299860Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6299907Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6299944Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6300156Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6300200Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6300236Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6300406Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6300450Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6300487Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6300631Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6300673Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6300710Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6300844Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6301909Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6301973Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6302099Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6302158Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6302202Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6302328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6302366Z leaves = list(leaves) 2025-09-07T07:34:42.6302400Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6302521Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6302557Z return func(x) 2025-09-07T07:34:42.6302589Z ^^^^^^^ 2025-09-07T07:34:42.6302729Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6302794Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6302835Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6303002Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6303043Z return func(*args, **kwargs) 2025-09-07T07:34:42.6303094Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6303276Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6303361Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6303363Z 2025-09-07T07:34:42.6303570Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6303574Z 2025-09-07T07:34:42.6303576Z 2025-09-07T07:34:42.6303649Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6303846Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.6303849Z 2025-09-07T07:34:42.6304897Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6304974Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6305010Z inline_call [] 2025-09-07T07:34:42.6305067Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6305140Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6305213Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6305488Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6305601Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6305652Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6305803Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6305890Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6306022Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6306140Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6306211Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6306245Z inline_call [] 2025-09-07T07:34:42.6306318Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6306403Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6306473Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6306812Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6306923Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6306973Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6308098Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6308184Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6308314Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6308438Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6308488Z =================================== FAILURES =================================== 2025-09-07T07:34:42.6308599Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.6308643Z Traceback (most recent call last): 2025-09-07T07:34:42.6308817Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6308853Z self._run_test( 2025-09-07T07:34:42.6308965Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6309019Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6309059Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6309194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6309239Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6309279Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6309428Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6309474Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6309513Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6309648Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6309692Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6309730Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6309872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6310938Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6310980Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6311131Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6311177Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6311325Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6311381Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6311421Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6311564Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6311614Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6311653Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6311806Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6311873Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6311916Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6312042Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6312105Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6312149Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6312288Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6312331Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6312368Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6312506Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6312546Z return aot_autograd( 2025-09-07T07:34:42.6313541Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6313679Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6313748Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6313793Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6313954Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6314062Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6314108Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6314289Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6314335Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6314521Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6314560Z fx_g = _create_graph( 2025-09-07T07:34:42.6314595Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6314758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6314795Z fx_g = make_fx( 2025-09-07T07:34:42.6314827Z ^^^^^^^^ 2025-09-07T07:34:42.6314977Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6315022Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6315061Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6315219Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6315264Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6315300Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6315459Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6316453Z t = dispatch_trace( 2025-09-07T07:34:42.6316588Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6316702Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6316746Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6316780Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6316909Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6316948Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6316984Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6317147Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6317275Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6317316Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6317440Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6317478Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6317513Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6317639Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6317680Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6317715Z ^^^^^^^^^ 2025-09-07T07:34:42.6317846Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6317886Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6317922Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6318074Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6318123Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6319144Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6319301Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6319363Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6319431Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6319607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6319646Z outs_pair = fn(*args) 2025-09-07T07:34:42.6319681Z ^^^^^^^^^ 2025-09-07T07:34:42.6319854Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6319925Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6319969Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6320201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6320239Z outs_pair = fn(*args) 2025-09-07T07:34:42.6320274Z ^^^^^^^^^ 2025-09-07T07:34:42.6320454Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6320512Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6320554Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6320777Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6320851Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6320897Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6321068Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6321106Z outs_pair = fn(*args) 2025-09-07T07:34:42.6322115Z ^^^^^^^^^ 2025-09-07T07:34:42.6322309Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6322354Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6322390Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6322559Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6322626Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6322678Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6322803Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6322845Z return handle_torch_function( 2025-09-07T07:34:42.6322881Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6323022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6323097Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6323142Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6323308Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6323349Z return func(*args, **kwargs) 2025-09-07T07:34:42.6323385Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6323512Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6323554Z result = _engine_run_backward( 2025-09-07T07:34:42.6323589Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6323737Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6323858Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6323925Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6325013Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6325055Z return user_fn(self, *args) 2025-09-07T07:34:42.6325091Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6325237Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6325284Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6325320Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6325478Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6325521Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6325558Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6325685Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6325724Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6325759Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6325924Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6325974Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6326030Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6326171Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6326221Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6326259Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6326419Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6326469Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6326583Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6327728Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6327768Z t = dispatch_trace( 2025-09-07T07:34:42.6327803Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6327919Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6327989Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6328046Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6328171Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6328209Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6328244Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6328404Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6328482Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6328522Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6328646Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6328684Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6328718Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6328848Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6328890Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6328923Z ^^^^^^^^^ 2025-09-07T07:34:42.6329072Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6329120Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6329153Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6329216Z File "", line 1, in 2025-09-07T07:34:42.6330325Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6330405Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6330450Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6330589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6330639Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6330677Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6330870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6330914Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6330949Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6331122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6331165Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6331202Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6331344Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6331407Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6331446Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6331581Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6331668Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6331714Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6331838Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6331899Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6331941Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6333023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6333062Z leaves = list(leaves) 2025-09-07T07:34:42.6333097Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6333262Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6333298Z return func(x) 2025-09-07T07:34:42.6333331Z ^^^^^^^ 2025-09-07T07:34:42.6333471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6333536Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6333576Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6333745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6333786Z return func(*args, **kwargs) 2025-09-07T07:34:42.6333822Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6334002Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6334089Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6334093Z 2025-09-07T07:34:42.6334299Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6334301Z 2025-09-07T07:34:42.6334304Z 2025-09-07T07:34:42.6334376Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6334571Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.6334591Z 2025-09-07T07:34:42.6334678Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6334752Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6334786Z inline_call [] 2025-09-07T07:34:42.6334843Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6334918Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6335951Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6336208Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6336320Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6336374Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6336604Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6336690Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6336842Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6336966Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6337038Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6337072Z inline_call [] 2025-09-07T07:34:42.6337128Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6337201Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6337270Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6337525Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6337634Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6337685Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6337852Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6337956Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6338085Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6338203Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6338274Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6339282Z inline_call [] 2025-09-07T07:34:42.6339338Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6339409Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6339478Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6339735Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6339845Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6339895Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6340047Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6340159Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6340287Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6340405Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6340624Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-dbaf4ff3449f39d4.xml - 2025-09-07T07:34:42.6340683Z =========================== short test summary info ============================ 2025-09-07T07:34:42.6341049Z FAILED [0.2884s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6341133Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6341136Z 2025-09-07T07:34:42.6341341Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6341343Z 2025-09-07T07:34:42.6341345Z 2025-09-07T07:34:42.6341417Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6341624Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.6341629Z 2025-09-07T07:34:42.6341713Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6341771Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.6341841Z ============= 1 failed, 1 passed, 69 deselected, 2 rerun in 3.49s ============== 2025-09-07T07:34:42.6342838Z Got exit code 1 2025-09-07T07:34:42.6342880Z Retrying single test... 2025-09-07T07:34:42.6343307Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.6343344Z import pkg_resources 2025-09-07T07:34:42.6343516Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-b0ba4413c8f4c735.xml 2025-09-07T07:34:42.6343599Z ============================= test session starts ============================== 2025-09-07T07:34:42.6343714Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.6343754Z cachedir: .pytest_cache 2025-09-07T07:34:42.6343909Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.6343954Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.6343992Z configfile: pytest.ini 2025-09-07T07:34:42.6344157Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.6344234Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.6344470Z stepcurrent: skipping 70 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.6344514Z Running 1 items in this shard 2025-09-07T07:34:42.6344516Z 2025-09-07T07:34:42.6344716Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.6242s] [100%] 2025-09-07T07:34:42.6344911Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.2877s] [100%] 2025-09-07T07:34:42.6345099Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True FAILED [0.2776s] [100%] 2025-09-07T07:34:42.6345101Z 2025-09-07T07:34:42.6345150Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.6345263Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.6345308Z Traceback (most recent call last): 2025-09-07T07:34:42.6345462Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6346459Z self._run_test( 2025-09-07T07:34:42.6346632Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6346687Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6346727Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6346864Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6346911Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6346949Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6347099Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6347170Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6347212Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6347349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6347394Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6347430Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6347574Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6347655Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6347694Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6347846Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6347893Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6348042Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6348135Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6348176Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6349297Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6349349Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6349388Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6349504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6349570Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6349614Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6349741Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6349805Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6349851Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6349991Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6350034Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6350073Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6350211Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6350278Z return aot_autograd( 2025-09-07T07:34:42.6350313Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6350449Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6350517Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6350565Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6350728Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6350811Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6350855Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6351037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6352051Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6352240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6352279Z fx_g = _create_graph( 2025-09-07T07:34:42.6352314Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6352501Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6352538Z fx_g = make_fx( 2025-09-07T07:34:42.6352571Z ^^^^^^^^ 2025-09-07T07:34:42.6352725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6352770Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6352808Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6352953Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6352998Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6353034Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6353193Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6353230Z t = dispatch_trace( 2025-09-07T07:34:42.6353265Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6353379Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6353461Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6353498Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6353623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6353663Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6353698Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6354835Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6354914Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6354956Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6355080Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6355121Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6355158Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6355285Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6355326Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6355360Z ^^^^^^^^^ 2025-09-07T07:34:42.6355493Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6355533Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6355591Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6355742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6355791Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6355825Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6355982Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6356045Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6356089Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6356265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6356303Z outs_pair = fn(*args) 2025-09-07T07:34:42.6356338Z ^^^^^^^^^ 2025-09-07T07:34:42.6357644Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6357716Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6357759Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6357933Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6357997Z outs_pair = fn(*args) 2025-09-07T07:34:42.6358033Z ^^^^^^^^^ 2025-09-07T07:34:42.6358211Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6358269Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6358312Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6358506Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6358578Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6358623Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6358800Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6358839Z outs_pair = fn(*args) 2025-09-07T07:34:42.6358892Z ^^^^^^^^^ 2025-09-07T07:34:42.6359100Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6359146Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6359182Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6359352Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6359399Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6359436Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6359561Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6360648Z return handle_torch_function( 2025-09-07T07:34:42.6360686Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6360832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6360911Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6360956Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6361125Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6361166Z return func(*args, **kwargs) 2025-09-07T07:34:42.6361227Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6361350Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6361391Z result = _engine_run_backward( 2025-09-07T07:34:42.6361427Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6361573Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6361697Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6361749Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6361878Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6361920Z return user_fn(self, *args) 2025-09-07T07:34:42.6361956Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6362102Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6362144Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6362181Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6362338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6363350Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6363405Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6363532Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6363571Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6363608Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6363772Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6363824Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6363867Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6364003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6364051Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6364089Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6364250Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6364328Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6364367Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6364525Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6364563Z t = dispatch_trace( 2025-09-07T07:34:42.6364598Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6364711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6364756Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6364792Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6364916Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6364955Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6366027Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6366193Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6366273Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6366314Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6366438Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6366566Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6366629Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6366757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6366797Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6366832Z ^^^^^^^^^ 2025-09-07T07:34:42.6366980Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6367032Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6367067Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6367110Z File "", line 1, in 2025-09-07T07:34:42.6367253Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6367331Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6367376Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6367515Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6367561Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6367599Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6367790Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6368842Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6368881Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6369058Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6369101Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6369139Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6369284Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6369328Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6369364Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6369497Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6369586Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6369632Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6369796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6369856Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6369899Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6370025Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6370064Z leaves = list(leaves) 2025-09-07T07:34:42.6370099Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6370224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6370260Z return func(x) 2025-09-07T07:34:42.6370293Z ^^^^^^^ 2025-09-07T07:34:42.6370431Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6370497Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6371501Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6371672Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6371713Z return func(*args, **kwargs) 2025-09-07T07:34:42.6371750Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6371932Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6372038Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6372041Z 2025-09-07T07:34:42.6372250Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6372252Z 2025-09-07T07:34:42.6372254Z 2025-09-07T07:34:42.6372330Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6372530Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.6372533Z 2025-09-07T07:34:42.6372618Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6372691Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6372726Z inline_call [] 2025-09-07T07:34:42.6372782Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6372821Z inductor [] 2025-09-07T07:34:42.6372893Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6372966Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6373245Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6373398Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6373450Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6373606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6373692Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6374793Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6374916Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6375027Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.6375070Z Traceback (most recent call last): 2025-09-07T07:34:42.6375221Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6375274Z self._run_test( 2025-09-07T07:34:42.6375401Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6375458Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6375500Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6375631Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6375680Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6375718Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6375908Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6375955Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6375993Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6376130Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6376176Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6376214Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6376356Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6376437Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6376567Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6376746Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6377823Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6377977Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6378029Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6378074Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6378216Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6378311Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6378350Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6378466Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6378534Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6378578Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6378704Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6378768Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6378808Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6378972Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6379015Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6379052Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6379190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6379229Z return aot_autograd( 2025-09-07T07:34:42.6379263Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6379400Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6379469Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6380486Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6380647Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6380757Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6380819Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6381003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6381046Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6381234Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6381276Z fx_g = _create_graph( 2025-09-07T07:34:42.6381311Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6381477Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6381511Z fx_g = make_fx( 2025-09-07T07:34:42.6381544Z ^^^^^^^^ 2025-09-07T07:34:42.6381699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6381747Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6381785Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6381932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6381975Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6382028Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6382187Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6382225Z t = dispatch_trace( 2025-09-07T07:34:42.6382259Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6382373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6383424Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6383466Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6383592Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6383632Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6383667Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6383830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6383955Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6384000Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6384124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6384162Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6384198Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6384342Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6384386Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6384420Z ^^^^^^^^^ 2025-09-07T07:34:42.6384553Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6384593Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6384628Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6384777Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6384827Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6384861Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6385020Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6385081Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6386178Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6386389Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6386429Z outs_pair = fn(*args) 2025-09-07T07:34:42.6386464Z ^^^^^^^^^ 2025-09-07T07:34:42.6386713Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6386779Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6386825Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6386998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6387037Z outs_pair = fn(*args) 2025-09-07T07:34:42.6387071Z ^^^^^^^^^ 2025-09-07T07:34:42.6387249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6387312Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6387356Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6387549Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6387621Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6387689Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6387860Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6387899Z outs_pair = fn(*args) 2025-09-07T07:34:42.6387932Z ^^^^^^^^^ 2025-09-07T07:34:42.6388124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6388171Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6389193Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6389363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6389409Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6389445Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6389573Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6389614Z return handle_torch_function( 2025-09-07T07:34:42.6389652Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6389793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6389891Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6389940Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6390108Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6390148Z return func(*args, **kwargs) 2025-09-07T07:34:42.6390183Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6390308Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6390350Z result = _engine_run_backward( 2025-09-07T07:34:42.6390385Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6390531Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6390651Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6390701Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6390930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6390972Z return user_fn(self, *args) 2025-09-07T07:34:42.6391008Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6392124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6392168Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6392206Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6392363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6392407Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6392442Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6392627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6392668Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6392705Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6392876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6392927Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6392967Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6393152Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6393229Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6393267Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6393428Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6393476Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6393516Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6393676Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6393715Z t = dispatch_trace( 2025-09-07T07:34:42.6393795Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6394875Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6394917Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6394956Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6395080Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6395119Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6395153Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6395313Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6395412Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6395455Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6395579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6395618Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6395652Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6395779Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6395822Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6395855Z ^^^^^^^^^ 2025-09-07T07:34:42.6396006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6396054Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6396088Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6396129Z File "", line 1, in 2025-09-07T07:34:42.6396287Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6396379Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6396424Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6397652Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6397702Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6397741Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6397979Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6398023Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6398058Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6398229Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6398277Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6398313Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6398457Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6398498Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6398534Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6398696Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6398784Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6398866Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6399102Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6399164Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6399208Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6399334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6399373Z leaves = list(leaves) 2025-09-07T07:34:42.6399408Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6399531Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6400610Z return func(x) 2025-09-07T07:34:42.6400642Z ^^^^^^^ 2025-09-07T07:34:42.6400780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6400844Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6400887Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6401080Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6401180Z return func(*args, **kwargs) 2025-09-07T07:34:42.6401215Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6401398Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6401483Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6401486Z 2025-09-07T07:34:42.6401694Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6401696Z 2025-09-07T07:34:42.6401699Z 2025-09-07T07:34:42.6401771Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6402014Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.6402035Z 2025-09-07T07:34:42.6402140Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6402215Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6402250Z inline_call [] 2025-09-07T07:34:42.6402308Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6402341Z inductor [] 2025-09-07T07:34:42.6402415Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6402488Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6402747Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6403835Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6403890Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6404047Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6404134Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6404264Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6404383Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6404474Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6404509Z inline_call [] 2025-09-07T07:34:42.6404564Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6404637Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6404707Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6404963Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6405071Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6405122Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6405271Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6405359Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6405527Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6405646Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6405710Z =================================== FAILURES =================================== 2025-09-07T07:34:42.6405825Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.6405868Z Traceback (most recent call last): 2025-09-07T07:34:42.6407125Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6407162Z self._run_test( 2025-09-07T07:34:42.6407276Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6407333Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6407374Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6407506Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6407552Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6407590Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6407770Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6407834Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6407872Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6408008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6408051Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6408088Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6408232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6408313Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6408351Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6408504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6408552Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6408703Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6408756Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6408797Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6409964Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6410041Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6410079Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6410198Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6410263Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6410307Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6410438Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6410547Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6410588Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6410728Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6410772Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6410812Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6410948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6410988Z return aot_autograd( 2025-09-07T07:34:42.6411023Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6411157Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6411242Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6411291Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6411452Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6411535Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6411580Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6411765Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6412831Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6413017Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6413057Z fx_g = _create_graph( 2025-09-07T07:34:42.6413092Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6413346Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6413381Z fx_g = make_fx( 2025-09-07T07:34:42.6413414Z ^^^^^^^^ 2025-09-07T07:34:42.6413565Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6413611Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6413649Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6413794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6413837Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6413874Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6414032Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6414074Z t = dispatch_trace( 2025-09-07T07:34:42.6414151Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6414265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6414306Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6414342Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6414467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6414523Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6415527Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6415690Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6415769Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6415810Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6415935Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6415977Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6416012Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6416137Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6416179Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6416258Z ^^^^^^^^^ 2025-09-07T07:34:42.6416392Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6416435Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6416470Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6416707Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6416758Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6416791Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6416980Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6417042Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6417087Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6417260Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6417302Z outs_pair = fn(*args) 2025-09-07T07:34:42.6417337Z ^^^^^^^^^ 2025-09-07T07:34:42.6418532Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6418664Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6418710Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6418910Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6418968Z outs_pair = fn(*args) 2025-09-07T07:34:42.6419002Z ^^^^^^^^^ 2025-09-07T07:34:42.6419180Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6419239Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6419282Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6419476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6419547Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6419592Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6419817Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6419860Z outs_pair = fn(*args) 2025-09-07T07:34:42.6419894Z ^^^^^^^^^ 2025-09-07T07:34:42.6420084Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6420129Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6420167Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6420359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6420405Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6420441Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6421597Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6421642Z return handle_torch_function( 2025-09-07T07:34:42.6421681Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6421825Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6421901Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6421945Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6422114Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6422157Z return func(*args, **kwargs) 2025-09-07T07:34:42.6422193Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6422316Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6422358Z result = _engine_run_backward( 2025-09-07T07:34:42.6422392Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6422557Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6422681Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6422730Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6422855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6422900Z return user_fn(self, *args) 2025-09-07T07:34:42.6422936Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6423079Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6423121Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6423157Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6423317Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6424412Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6424450Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6424575Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6424614Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6424649Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6424815Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6424867Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6424907Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6425043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6425094Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6425134Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6425341Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6425388Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6425428Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6425586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6425640Z t = dispatch_trace( 2025-09-07T07:34:42.6425674Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6425789Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6425831Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6425867Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6425993Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6427146Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6427184Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6427348Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6427500Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6427541Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6427665Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6427706Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6427740Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6427866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6427907Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6427941Z ^^^^^^^^^ 2025-09-07T07:34:42.6428114Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6428166Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6428201Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6428242Z File "", line 1, in 2025-09-07T07:34:42.6428387Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6428465Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6428514Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6428649Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6428696Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6428733Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6428980Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6430092Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6430130Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6430302Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6430345Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6430382Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6430526Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6430568Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6430604Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6430737Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6430827Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6430876Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6431002Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6431061Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6431103Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6431229Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6431287Z leaves = list(leaves) 2025-09-07T07:34:42.6431323Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6431446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6431481Z return func(x) 2025-09-07T07:34:42.6431514Z ^^^^^^^ 2025-09-07T07:34:42.6431699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6431766Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6432779Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6432991Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6433032Z return func(*args, **kwargs) 2025-09-07T07:34:42.6433067Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6433252Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6433336Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6433338Z 2025-09-07T07:34:42.6433544Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6433565Z 2025-09-07T07:34:42.6433567Z 2025-09-07T07:34:42.6433641Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6433838Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.6433840Z 2025-09-07T07:34:42.6433926Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6434002Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6434037Z inline_call [] 2025-09-07T07:34:42.6434094Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6434127Z inductor [] 2025-09-07T07:34:42.6434201Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6434272Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6434551Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6434677Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6434729Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6434879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6434967Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6436064Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6436237Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6436308Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6436345Z inline_call [] 2025-09-07T07:34:42.6436403Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6436477Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6436617Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6436872Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6437009Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6437059Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6437209Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6437295Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6437425Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6437546Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6437615Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6437650Z inline_call [] 2025-09-07T07:34:42.6437705Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6437776Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6437845Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6438144Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6438252Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6439354Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6439506Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6439592Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6439720Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6439838Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6440059Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-b0ba4413c8f4c735.xml - 2025-09-07T07:34:42.6440118Z =========================== short test summary info ============================ 2025-09-07T07:34:42.6440794Z FAILED [0.2776s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6440952Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6440955Z 2025-09-07T07:34:42.6441160Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6441164Z 2025-09-07T07:34:42.6441165Z 2025-09-07T07:34:42.6441237Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6441433Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.6441436Z 2025-09-07T07:34:42.6441520Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6441625Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.6441696Z ================== 1 failed, 245 deselected, 2 rerun in 1.39s ================== 2025-09-07T07:34:42.6441731Z Got exit code 1 2025-09-07T07:34:42.6441770Z Retrying single test... 2025-09-07T07:34:42.6442192Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.6442248Z import pkg_resources 2025-09-07T07:34:42.6442417Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-5fcc49b93a6b85e4.xml 2025-09-07T07:34:42.6442520Z ============================= test session starts ============================== 2025-09-07T07:34:42.6442635Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.6443701Z cachedir: .pytest_cache 2025-09-07T07:34:42.6443860Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.6443905Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.6443943Z configfile: pytest.ini 2025-09-07T07:34:42.6444106Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.6444229Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.6444465Z stepcurrent: skipping 70 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.6444508Z Running 1 items in this shard 2025-09-07T07:34:42.6444510Z 2025-09-07T07:34:42.6444730Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.5955s] [100%] 2025-09-07T07:34:42.6444928Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.2819s] [100%] 2025-09-07T07:34:42.6445137Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True FAILED [0.2774s] [100%] 2025-09-07T07:34:42.6445141Z 2025-09-07T07:34:42.6445190Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.6445304Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.6445346Z Traceback (most recent call last): 2025-09-07T07:34:42.6445503Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6445539Z self._run_test( 2025-09-07T07:34:42.6445690Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6445746Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6445787Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6445921Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6445969Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6447134Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6447289Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6447336Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6447375Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6447512Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6447562Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6447601Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6447744Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6447825Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6447865Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6448115Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6448190Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6448340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6448394Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6448435Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6448578Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6448631Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6448669Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6448787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6448853Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6448899Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6449026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6450074Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6450117Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6450327Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6450372Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6450411Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6450549Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6450589Z return aot_autograd( 2025-09-07T07:34:42.6450624Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6450759Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6450830Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6450876Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6451036Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6451121Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6451185Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6451386Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6451430Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6451652Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6451694Z fx_g = _create_graph( 2025-09-07T07:34:42.6451730Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6451893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6451928Z fx_g = make_fx( 2025-09-07T07:34:42.6452937Z ^^^^^^^^ 2025-09-07T07:34:42.6453093Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6453141Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6453182Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6453330Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6453373Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6453409Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6453622Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6453681Z t = dispatch_trace( 2025-09-07T07:34:42.6453715Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6453829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6453870Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6453906Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6454032Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6454074Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6454108Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6454271Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6454349Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6454391Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6454516Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6454555Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6454589Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6455683Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6455725Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6455790Z ^^^^^^^^^ 2025-09-07T07:34:42.6455976Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6456017Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6456051Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6456201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6456252Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6456288Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6456445Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6456578Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6456623Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6456802Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6456883Z outs_pair = fn(*args) 2025-09-07T07:34:42.6456918Z ^^^^^^^^^ 2025-09-07T07:34:42.6457090Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6457202Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6457247Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6457422Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6457461Z outs_pair = fn(*args) 2025-09-07T07:34:42.6457494Z ^^^^^^^^^ 2025-09-07T07:34:42.6458736Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6458802Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6458846Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6459041Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6459111Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6459156Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6459354Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6459392Z outs_pair = fn(*args) 2025-09-07T07:34:42.6459427Z ^^^^^^^^^ 2025-09-07T07:34:42.6459665Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6459711Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6459750Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6459919Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6459964Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6460001Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6460126Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6460171Z return handle_torch_function( 2025-09-07T07:34:42.6460207Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6460347Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6460422Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6460482Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6460704Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6461719Z return func(*args, **kwargs) 2025-09-07T07:34:42.6461756Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6461879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6461921Z result = _engine_run_backward( 2025-09-07T07:34:42.6461959Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6462105Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6462273Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6462323Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6462451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6462529Z return user_fn(self, *args) 2025-09-07T07:34:42.6462566Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6462712Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6462754Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6462791Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6462950Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6462994Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6463031Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6463203Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6463242Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6463279Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6463449Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6464468Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6464508Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6464693Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6464762Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6464800Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6464961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6465009Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6465047Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6465204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6465247Z t = dispatch_trace( 2025-09-07T07:34:42.6465281Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6465432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6465475Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6465512Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6465636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6465678Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6465712Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6465873Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6465996Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6466051Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6466178Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6466217Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6467354Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6467485Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6467525Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6467563Z ^^^^^^^^^ 2025-09-07T07:34:42.6467715Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6467764Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6467797Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6467840Z File "", line 1, in 2025-09-07T07:34:42.6467983Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6468106Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6468151Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6468286Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6468334Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6468371Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6468566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6468609Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6468645Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6468815Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6468863Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6468900Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6469044Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6469086Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6470090Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6470225Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6470337Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6470382Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6470509Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6470568Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6470613Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6470742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6470781Z leaves = list(leaves) 2025-09-07T07:34:42.6470815Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6470938Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6470972Z return func(x) 2025-09-07T07:34:42.6471005Z ^^^^^^^ 2025-09-07T07:34:42.6471145Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6471210Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6471251Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6471417Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6471478Z return func(*args, **kwargs) 2025-09-07T07:34:42.6471516Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6471700Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6471785Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6471787Z 2025-09-07T07:34:42.6473052Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6473058Z 2025-09-07T07:34:42.6473059Z 2025-09-07T07:34:42.6473135Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6473332Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.6473335Z 2025-09-07T07:34:42.6473421Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6473528Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6473565Z inline_call [] 2025-09-07T07:34:42.6473621Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6473655Z inductor [] 2025-09-07T07:34:42.6473729Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6473861Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6474119Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6474232Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6474283Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6474434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6474524Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6474654Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6474774Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6474885Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.6474953Z Traceback (most recent call last): 2025-09-07T07:34:42.6475102Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6475137Z self._run_test( 2025-09-07T07:34:42.6475248Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6476331Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6476421Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6476630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6476676Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6476715Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6476865Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6476914Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6476952Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6477088Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6477131Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6477169Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6477336Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6477468Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6477506Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6477659Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6477703Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6477855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6477907Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6477947Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6478088Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6478140Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6479226Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6479400Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6479466Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6479511Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6479636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6479699Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6479741Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6479925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6479970Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6480006Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6480208Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6480247Z return aot_autograd( 2025-09-07T07:34:42.6480283Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6480418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6480487Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6480531Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6480712Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6480795Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6480841Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6481023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6481066Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6481300Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6482365Z fx_g = _create_graph( 2025-09-07T07:34:42.6482401Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6482565Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6482602Z fx_g = make_fx( 2025-09-07T07:34:42.6482634Z ^^^^^^^^ 2025-09-07T07:34:42.6482785Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6482832Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6482869Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6483034Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6483078Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6483114Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6483274Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6483312Z t = dispatch_trace( 2025-09-07T07:34:42.6483349Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6483462Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6483506Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6483541Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6483667Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6483706Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6483759Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6483936Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6484982Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6485023Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6485149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6485190Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6485225Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6485350Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6485392Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6485426Z ^^^^^^^^^ 2025-09-07T07:34:42.6485604Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6485646Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6485683Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6485832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6485882Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6485915Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6486072Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6486187Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6486232Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6486408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6486447Z outs_pair = fn(*args) 2025-09-07T07:34:42.6486604Z ^^^^^^^^^ 2025-09-07T07:34:42.6486778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6486846Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6487869Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6488046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6488087Z outs_pair = fn(*args) 2025-09-07T07:34:42.6488122Z ^^^^^^^^^ 2025-09-07T07:34:42.6488298Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6488358Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6488400Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6488622Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6488692Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6488739Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6488913Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6488953Z outs_pair = fn(*args) 2025-09-07T07:34:42.6488987Z ^^^^^^^^^ 2025-09-07T07:34:42.6489285Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6489330Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6489368Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6489555Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6489620Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6489656Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6489782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6489823Z return handle_torch_function( 2025-09-07T07:34:42.6490890Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6491033Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6491109Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6491153Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6491324Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6491367Z return func(*args, **kwargs) 2025-09-07T07:34:42.6491457Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6491580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6491622Z result = _engine_run_backward( 2025-09-07T07:34:42.6491657Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6491802Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6491946Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6491994Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6492123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6492164Z return user_fn(self, *args) 2025-09-07T07:34:42.6492204Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6492349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6492393Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6492429Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6492587Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6492665Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6492703Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6493840Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6493881Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6493915Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6494098Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6494152Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6494192Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6494328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6494377Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6494415Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6494575Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6494624Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6494663Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6494820Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6494859Z t = dispatch_trace( 2025-09-07T07:34:42.6494893Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6495041Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6495085Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6495121Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6495247Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6495284Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6495320Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6495480Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6496653Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6496694Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6496865Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6496907Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6496944Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6497070Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6497112Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6497146Z ^^^^^^^^^ 2025-09-07T07:34:42.6497296Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6497372Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6497406Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6497448Z File "", line 1, in 2025-09-07T07:34:42.6497592Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6497669Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6497716Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6497854Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6497901Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6497938Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6498130Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6498174Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6498210Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6499359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6499405Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6499442Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6499613Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6499695Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6499730Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6499864Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6499953Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6500000Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6500128Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6500188Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6500230Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6500358Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6500413Z leaves = list(leaves) 2025-09-07T07:34:42.6500464Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6500586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6500622Z return func(x) 2025-09-07T07:34:42.6500654Z ^^^^^^^ 2025-09-07T07:34:42.6500792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6500858Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6500899Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6501114Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6502226Z return func(*args, **kwargs) 2025-09-07T07:34:42.6502264Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6502448Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6502538Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6502541Z 2025-09-07T07:34:42.6502749Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6502751Z 2025-09-07T07:34:42.6502753Z 2025-09-07T07:34:42.6502845Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6503042Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.6503044Z 2025-09-07T07:34:42.6503176Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6503251Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6503289Z inline_call [] 2025-09-07T07:34:42.6503347Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6503380Z inductor [] 2025-09-07T07:34:42.6503454Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6503526Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6503785Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6503900Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6503951Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6504102Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6504201Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6504384Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6504504Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6505539Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6505576Z inline_call [] 2025-09-07T07:34:42.6505677Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6505752Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6505821Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6506074Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6506184Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6506265Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6506414Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6506566Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6506696Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6506815Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6506866Z =================================== FAILURES =================================== 2025-09-07T07:34:42.6506978Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True _ 2025-09-07T07:34:42.6507022Z Traceback (most recent call last): 2025-09-07T07:34:42.6507208Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6507248Z self._run_test( 2025-09-07T07:34:42.6507360Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6507415Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6507455Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6507587Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6507656Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6508684Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6508836Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6508882Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6508920Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6509113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6509156Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6509193Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6509335Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6509415Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6509456Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6509610Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6509654Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6509804Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6509880Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6509923Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6510066Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6510118Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6510156Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6510273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6510340Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6510383Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6511476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6511588Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6511630Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6511809Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6511854Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6511891Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6512029Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6512067Z return aot_autograd( 2025-09-07T07:34:42.6512104Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6512239Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6512309Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6512353Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6512517Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6512603Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6512648Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6512830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6512874Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6513059Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6513115Z fx_g = _create_graph( 2025-09-07T07:34:42.6513149Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6513318Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6513352Z fx_g = make_fx( 2025-09-07T07:34:42.6514404Z ^^^^^^^^ 2025-09-07T07:34:42.6514562Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6514609Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6514646Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6514792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6514834Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6514874Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6515078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6515116Z t = dispatch_trace( 2025-09-07T07:34:42.6515150Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6515262Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6515321Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6515357Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6515485Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6515524Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6515559Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6515721Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6515800Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6515840Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6515965Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6516002Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6517120Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6517307Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6517397Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6517432Z ^^^^^^^^^ 2025-09-07T07:34:42.6517567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6517607Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6517643Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6517792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6517843Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6517876Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6518033Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6518095Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6518141Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6518319Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6518403Z outs_pair = fn(*args) 2025-09-07T07:34:42.6518438Z ^^^^^^^^^ 2025-09-07T07:34:42.6518610Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6518701Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6518744Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6518917Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6518955Z outs_pair = fn(*args) 2025-09-07T07:34:42.6518991Z ^^^^^^^^^ 2025-09-07T07:34:42.6520274Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6520338Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6520380Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6520577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6520647Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6520695Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6520868Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6520907Z outs_pair = fn(*args) 2025-09-07T07:34:42.6520941Z ^^^^^^^^^ 2025-09-07T07:34:42.6521153Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6521199Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6521236Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6521405Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6521499Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6521535Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6521665Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6521707Z return handle_torch_function( 2025-09-07T07:34:42.6521744Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6521883Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6521959Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6522033Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6523177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6523219Z return func(*args, **kwargs) 2025-09-07T07:34:42.6523255Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6523378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6523421Z result = _engine_run_backward( 2025-09-07T07:34:42.6523456Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6523602Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6523723Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6523820Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6523949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6523990Z return user_fn(self, *args) 2025-09-07T07:34:42.6524027Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6524170Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6524232Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6524268Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6524426Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6524470Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6524507Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6524630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6524670Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6524705Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6524870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6525971Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6526012Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6526149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6526201Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6526238Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6526399Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6526446Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6526578Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6526788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6526827Z t = dispatch_trace( 2025-09-07T07:34:42.6526861Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6526974Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6527016Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6527053Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6527178Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6527215Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6527251Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6527412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6527510Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6527569Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6527694Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6527732Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6528847Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6529023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6529066Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6529100Z ^^^^^^^^^ 2025-09-07T07:34:42.6529251Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6529299Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6529334Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6529376Z File "", line 1, in 2025-09-07T07:34:42.6529526Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6529603Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6529649Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6529831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6529903Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6529940Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6530132Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6530175Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6530210Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6530384Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6530430Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6530468Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6530610Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6531667Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6531703Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6531889Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6531977Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6532023Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6532164Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6532227Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6532271Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6532398Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6532437Z leaves = list(leaves) 2025-09-07T07:34:42.6532471Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6532593Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6532631Z return func(x) 2025-09-07T07:34:42.6532663Z ^^^^^^^ 2025-09-07T07:34:42.6532801Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6532912Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6532954Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6533134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6533190Z return func(*args, **kwargs) 2025-09-07T07:34:42.6533225Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6533407Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6533493Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6533496Z 2025-09-07T07:34:42.6534813Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6534816Z 2025-09-07T07:34:42.6534818Z 2025-09-07T07:34:42.6534892Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6535091Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.6535097Z 2025-09-07T07:34:42.6535184Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6535260Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6535294Z inline_call [] 2025-09-07T07:34:42.6535351Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6535385Z inductor [] 2025-09-07T07:34:42.6535459Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6535552Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6535809Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6535922Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6535974Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6536125Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6536212Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6536342Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6536461Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6536591Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6536627Z inline_call [] 2025-09-07T07:34:42.6536682Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6536753Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6537830Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6538089Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6538200Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6538250Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6538399Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6538487Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6538616Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6538735Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6538806Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6538859Z inline_call [] 2025-09-07T07:34:42.6538931Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6539003Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6539073Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6539325Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6539435Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6539484Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6539633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6539717Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6539848Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6539965Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6540183Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-5fcc49b93a6b85e4.xml - 2025-09-07T07:34:42.6540310Z =========================== short test summary info ============================ 2025-09-07T07:34:42.6541650Z FAILED [0.2774s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6541735Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6541741Z 2025-09-07T07:34:42.6541948Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6541950Z 2025-09-07T07:34:42.6541952Z 2025-09-07T07:34:42.6542024Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6542219Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True 2025-09-07T07:34:42.6542224Z 2025-09-07T07:34:42.6542310Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6542370Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.6542436Z ================== 1 failed, 245 deselected, 2 rerun in 1.32s ================== 2025-09-07T07:34:42.6542470Z Got exit code 1 2025-09-07T07:34:42.6542662Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-09-07T07:34:42.6543085Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.6543125Z import pkg_resources 2025-09-07T07:34:42.6543293Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-33d061a67b96a6b2.xml 2025-09-07T07:34:42.6543350Z ============================= test session starts ============================== 2025-09-07T07:34:42.6543465Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.6543504Z cachedir: .pytest_cache 2025-09-07T07:34:42.6543662Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.6543736Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.6543775Z configfile: pytest.ini 2025-09-07T07:34:42.6543938Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.6544013Z collecting ... collected 467 items / 71 deselected / 396 selected 2025-09-07T07:34:42.6545086Z stepcurrent: skipping 71 already run items. 2025-09-07T07:34:42.6545176Z Running 175 items in this shard 2025-09-07T07:34:42.6545178Z 2025-09-07T07:34:42.6545380Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [1.1380s] [ 0%] 2025-09-07T07:34:42.6545574Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.8213s] [ 0%] 2025-09-07T07:34:42.6545751Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True FAILED [0.7812s] [ 0%] 2025-09-07T07:34:42.6545754Z 2025-09-07T07:34:42.6545802Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.6545914Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.6545957Z Traceback (most recent call last): 2025-09-07T07:34:42.6546110Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6546164Z self._run_test( 2025-09-07T07:34:42.6546278Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6546333Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6546373Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6546669Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6546721Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6546760Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6546913Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6546959Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6546997Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6547138Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6547181Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6548215Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6548358Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6548470Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6548511Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6548664Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6548710Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6548860Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6548915Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6548956Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6549097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6549149Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6549187Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6549305Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6549409Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6549455Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6549580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6549645Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6549686Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6549828Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6549871Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6549909Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6550047Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6551060Z return aot_autograd( 2025-09-07T07:34:42.6551098Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6551238Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6551307Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6551352Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6551514Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6551628Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6551672Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6551856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6551900Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6552088Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6552128Z fx_g = _create_graph( 2025-09-07T07:34:42.6552163Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6552326Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6552360Z fx_g = make_fx( 2025-09-07T07:34:42.6552396Z ^^^^^^^^ 2025-09-07T07:34:42.6552547Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6552593Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6552631Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6552777Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6552834Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6553845Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6554004Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6554043Z t = dispatch_trace( 2025-09-07T07:34:42.6554076Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6554189Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6554234Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6554270Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6554395Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6554436Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6554471Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6554633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6554742Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6554783Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6554909Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6554947Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6554982Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6555108Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6555151Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6555185Z ^^^^^^^^^ 2025-09-07T07:34:42.6555321Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6555361Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6555396Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6556584Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6556637Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6556671Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6556876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6556938Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6557016Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6557192Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6557231Z outs_pair = fn(*args) 2025-09-07T07:34:42.6557265Z ^^^^^^^^^ 2025-09-07T07:34:42.6557438Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6557507Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6557552Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6557724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6557762Z outs_pair = fn(*args) 2025-09-07T07:34:42.6557796Z ^^^^^^^^^ 2025-09-07T07:34:42.6557975Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6558034Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6558076Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6558292Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6558362Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6558409Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6559614Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6559654Z outs_pair = fn(*args) 2025-09-07T07:34:42.6559688Z ^^^^^^^^^ 2025-09-07T07:34:42.6559878Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6559925Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6559961Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6560130Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6560240Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6560300Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6560443Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6560485Z return handle_torch_function( 2025-09-07T07:34:42.6560522Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6560662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6560737Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6560830Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6560998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6561038Z return func(*args, **kwargs) 2025-09-07T07:34:42.6561074Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6561198Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6561243Z result = _engine_run_backward( 2025-09-07T07:34:42.6561278Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6561424Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6562517Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6562588Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6562715Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6562758Z return user_fn(self, *args) 2025-09-07T07:34:42.6562794Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6562939Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6563031Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6563070Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6563227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6563271Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6563306Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6563429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6563472Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6563507Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6563674Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6563725Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6563779Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6563919Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6563969Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6564007Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6564169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6564215Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6565397Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6565558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6565597Z t = dispatch_trace( 2025-09-07T07:34:42.6565630Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6565745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6565807Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6565864Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6565990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6566029Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6566063Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6566224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6566303Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6566344Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6566468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6566563Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6566597Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6566724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6566771Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6566805Z ^^^^^^^^^ 2025-09-07T07:34:42.6566956Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6567004Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6568067Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6568140Z File "", line 1, in 2025-09-07T07:34:42.6568284Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6568361Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6568407Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6568544Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6568596Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6568633Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6568828Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6568871Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6568907Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6569080Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6569124Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6569160Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6569303Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6569344Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6569398Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6569532Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6569621Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6569666Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6569791Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6569853Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6570910Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6571037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6571076Z leaves = list(leaves) 2025-09-07T07:34:42.6571110Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6571234Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6571313Z return func(x) 2025-09-07T07:34:42.6571346Z ^^^^^^^ 2025-09-07T07:34:42.6571485Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6571549Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6571590Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6571758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6571799Z return func(*args, **kwargs) 2025-09-07T07:34:42.6571834Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6572016Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6572147Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6572152Z 2025-09-07T07:34:42.6572358Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6572361Z 2025-09-07T07:34:42.6572362Z 2025-09-07T07:34:42.6572435Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6572669Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.6572687Z 2025-09-07T07:34:42.6572773Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6572849Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6572883Z inline_call [] 2025-09-07T07:34:42.6573957Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6573993Z inductor [] 2025-09-07T07:34:42.6574071Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6574145Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6574402Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6574513Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6574569Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6574719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6574805Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6574936Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6575078Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6575234Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.6575279Z Traceback (most recent call last): 2025-09-07T07:34:42.6575430Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6575466Z self._run_test( 2025-09-07T07:34:42.6575580Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6575635Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6575675Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6575807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6575852Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6575908Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6576080Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6577181Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6577221Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6577358Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6577402Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6577439Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6577582Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6577664Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6577702Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6577855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6577906Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6578054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6578108Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6578148Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6578290Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6578369Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6578409Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6578526Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6578593Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6578699Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6578827Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6578889Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6579907Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6580047Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6580095Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6580131Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6580270Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6580309Z return aot_autograd( 2025-09-07T07:34:42.6580345Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6580505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6580578Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6580623Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6580784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6580866Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6580915Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6581097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6581140Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6581325Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6581384Z fx_g = _create_graph( 2025-09-07T07:34:42.6581436Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6581599Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6581634Z fx_g = make_fx( 2025-09-07T07:34:42.6581666Z ^^^^^^^^ 2025-09-07T07:34:42.6581818Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6582880Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6582919Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6583119Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6583163Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6583199Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6583359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6583402Z t = dispatch_trace( 2025-09-07T07:34:42.6583436Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6583549Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6583590Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6583626Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6583753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6583815Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6583851Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6584014Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6584094Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6584135Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6584261Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6584299Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6584334Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6584460Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6585550Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6585588Z ^^^^^^^^^ 2025-09-07T07:34:42.6585721Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6585761Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6585795Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6585945Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6586009Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6586046Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6586203Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6586265Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6586309Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6586609Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6586650Z outs_pair = fn(*args) 2025-09-07T07:34:42.6586686Z ^^^^^^^^^ 2025-09-07T07:34:42.6586856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6586922Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6586967Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6587182Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6587220Z outs_pair = fn(*args) 2025-09-07T07:34:42.6587255Z ^^^^^^^^^ 2025-09-07T07:34:42.6587432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6587495Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6588608Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6588804Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6588874Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6588923Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6589101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6589139Z outs_pair = fn(*args) 2025-09-07T07:34:42.6589173Z ^^^^^^^^^ 2025-09-07T07:34:42.6589363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6589407Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6589466Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6589634Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6589680Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6589716Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6589887Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6589932Z return handle_torch_function( 2025-09-07T07:34:42.6589968Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6590111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6590185Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6590230Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6590398Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6590439Z return func(*args, **kwargs) 2025-09-07T07:34:42.6590474Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6591619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6591661Z result = _engine_run_backward( 2025-09-07T07:34:42.6591720Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6591914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6592035Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6592083Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6592210Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6592253Z return user_fn(self, *args) 2025-09-07T07:34:42.6592289Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6592432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6592475Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6592511Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6592745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6592789Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6592825Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6592948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6593033Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6593068Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6593235Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6593286Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6593325Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6594430Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6594485Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6594525Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6594689Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6594735Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6594774Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6594934Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6594991Z t = dispatch_trace( 2025-09-07T07:34:42.6595026Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6595138Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6595181Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6595265Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6595394Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6595435Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6595470Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6595634Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6595714Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6595755Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6595884Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6595922Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6595957Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6596083Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6597271Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6597338Z ^^^^^^^^^ 2025-09-07T07:34:42.6597497Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6597545Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6597628Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6597671Z File "", line 1, in 2025-09-07T07:34:42.6597815Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6597897Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6597943Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6598080Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6598126Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6598165Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6598404Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6598450Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6598486Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6598704Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6598750Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6598787Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6598930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6598973Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6599008Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6600120Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6600275Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6600321Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6600446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6600507Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6600549Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6600703Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6600742Z leaves = list(leaves) 2025-09-07T07:34:42.6600776Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6600899Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6600934Z return func(x) 2025-09-07T07:34:42.6600968Z ^^^^^^^ 2025-09-07T07:34:42.6601107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6601173Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6601214Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6601382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6601423Z return func(*args, **kwargs) 2025-09-07T07:34:42.6601462Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6601643Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6601728Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6601731Z 2025-09-07T07:34:42.6602002Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6602008Z 2025-09-07T07:34:42.6602011Z 2025-09-07T07:34:42.6602084Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6603257Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.6603260Z 2025-09-07T07:34:42.6603348Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6603426Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6603461Z inline_call [] 2025-09-07T07:34:42.6603518Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6603551Z inductor [] 2025-09-07T07:34:42.6603624Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6603697Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6603996Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6604110Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6604160Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6604312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6604400Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6604531Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6604652Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6604724Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6604761Z inline_call [] 2025-09-07T07:34:42.6604817Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6604888Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6604957Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6605212Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6606303Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6606354Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6606567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6606653Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6606790Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6606909Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6606960Z =================================== FAILURES =================================== 2025-09-07T07:34:42.6607069Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.6607117Z Traceback (most recent call last): 2025-09-07T07:34:42.6607266Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6607302Z self._run_test( 2025-09-07T07:34:42.6607411Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6607467Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6607533Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6607670Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6607715Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6607755Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6607905Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6607954Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6607992Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6608128Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6608172Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6609191Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6609338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6609464Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6609502Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6609656Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6609702Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6609851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6609906Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6609946Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6610089Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6610140Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6610183Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6610351Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6610418Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6610461Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6610590Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6610671Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6610713Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6610852Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6610895Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6610932Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6612043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6612087Z return aot_autograd( 2025-09-07T07:34:42.6612122Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6612260Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6612381Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6612427Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6612593Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6612675Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6612720Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6612919Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6612965Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6613150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6613190Z fx_g = _create_graph( 2025-09-07T07:34:42.6613225Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6613388Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6613423Z fx_g = make_fx( 2025-09-07T07:34:42.6613456Z ^^^^^^^^ 2025-09-07T07:34:42.6613610Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6613656Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6613694Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6613855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6613914Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6614913Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6615074Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6615112Z t = dispatch_trace( 2025-09-07T07:34:42.6615146Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6615260Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6615302Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6615338Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6615463Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6615502Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6615540Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6615705Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6615784Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6615825Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6615949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6615986Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6616040Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6616166Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6616208Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6616242Z ^^^^^^^^^ 2025-09-07T07:34:42.6616375Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6616417Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6616454Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6617703Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6617753Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6617787Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6617943Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6618008Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6618051Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6618227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6618265Z outs_pair = fn(*args) 2025-09-07T07:34:42.6618300Z ^^^^^^^^^ 2025-09-07T07:34:42.6618505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6618573Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6618617Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6618790Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6618831Z outs_pair = fn(*args) 2025-09-07T07:34:42.6618866Z ^^^^^^^^^ 2025-09-07T07:34:42.6619044Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6619104Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6619146Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6619365Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6619456Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6619502Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6620695Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6620736Z outs_pair = fn(*args) 2025-09-07T07:34:42.6620770Z ^^^^^^^^^ 2025-09-07T07:34:42.6620961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6621005Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6621041Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6621257Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6621309Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6621346Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6621472Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6621514Z return handle_torch_function( 2025-09-07T07:34:42.6621625Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6621767Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6621867Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6621914Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6622082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6622125Z return func(*args, **kwargs) 2025-09-07T07:34:42.6622161Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6622289Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6622330Z result = _engine_run_backward( 2025-09-07T07:34:42.6622366Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6622513Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6623602Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6623652Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6623827Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6623868Z return user_fn(self, *args) 2025-09-07T07:34:42.6623921Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6624068Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6624112Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6624148Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6624306Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6624349Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6624388Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6624511Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6624551Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6624585Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6624751Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6624818Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6624873Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6625010Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6625059Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6625098Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6625260Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6626274Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6626314Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6626474Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6626587Z t = dispatch_trace( 2025-09-07T07:34:42.6626622Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6626740Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6626784Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6626819Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6626942Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6626980Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6627015Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6627204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6627283Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6627323Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6627447Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6627486Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6627522Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6627650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6627691Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6627725Z ^^^^^^^^^ 2025-09-07T07:34:42.6627875Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6627924Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6628928Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6628971Z File "", line 1, in 2025-09-07T07:34:42.6629166Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6629245Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6629315Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6629456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6629503Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6629542Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6629732Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6629779Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6629813Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6629984Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6630028Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6630065Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6630208Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6630287Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6630323Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6630457Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6630545Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6630591Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6630718Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6631738Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6631781Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6631955Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6631995Z leaves = list(leaves) 2025-09-07T07:34:42.6632034Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6632160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6632195Z return func(x) 2025-09-07T07:34:42.6632228Z ^^^^^^^ 2025-09-07T07:34:42.6632367Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6632430Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6632490Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6632657Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6632747Z return func(*args, **kwargs) 2025-09-07T07:34:42.6632783Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6632967Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6633056Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6633059Z 2025-09-07T07:34:42.6633264Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6633268Z 2025-09-07T07:34:42.6633270Z 2025-09-07T07:34:42.6633343Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6633541Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.6633543Z 2025-09-07T07:34:42.6633674Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6633749Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6633798Z inline_call [] 2025-09-07T07:34:42.6634824Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6634860Z inductor [] 2025-09-07T07:34:42.6634935Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6635006Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6635263Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6635376Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6635427Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6635580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6635664Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6635827Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6635946Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6636018Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6636051Z inline_call [] 2025-09-07T07:34:42.6636107Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6636180Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6636251Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6636627Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6636738Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6636790Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6636941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6637025Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6638185Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6638340Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6638457Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6638492Z inline_call [] 2025-09-07T07:34:42.6638547Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6638619Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6638691Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6638951Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6639060Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6639109Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6639306Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6639393Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6639522Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6639639Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6639873Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-33d061a67b96a6b2.xml - 2025-09-07T07:34:42.6639934Z =========================== short test summary info ============================ 2025-09-07T07:34:42.6640393Z FAILED [0.7812s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6640477Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6640480Z 2025-09-07T07:34:42.6640685Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6640687Z 2025-09-07T07:34:42.6640689Z 2025-09-07T07:34:42.6640761Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6641000Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.6641002Z 2025-09-07T07:34:42.6642071Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6642133Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.6642198Z ================== 1 failed, 71 deselected, 2 rerun in 3.02s =================== 2025-09-07T07:34:42.6642235Z Got exit code 1 2025-09-07T07:34:42.6642323Z Retrying single test... 2025-09-07T07:34:42.6642750Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.6642792Z import pkg_resources 2025-09-07T07:34:42.6642965Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-782e8bc3b1f5ef8d.xml 2025-09-07T07:34:42.6643023Z ============================= test session starts ============================== 2025-09-07T07:34:42.6643135Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.6643173Z cachedir: .pytest_cache 2025-09-07T07:34:42.6643351Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.6643396Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.6643482Z configfile: pytest.ini 2025-09-07T07:34:42.6643643Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.6643721Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.6643954Z stepcurrent: skipping 71 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.6643998Z Running 1 items in this shard 2025-09-07T07:34:42.6644000Z 2025-09-07T07:34:42.6644197Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [1.1663s] [100%] 2025-09-07T07:34:42.6644393Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.7943s] [100%] 2025-09-07T07:34:42.6644563Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True FAILED [0.7753s] [100%] 2025-09-07T07:34:42.6644566Z 2025-09-07T07:34:42.6644628Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.6645767Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.6645812Z Traceback (most recent call last): 2025-09-07T07:34:42.6645965Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6646000Z self._run_test( 2025-09-07T07:34:42.6646112Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6646171Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6646212Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6646346Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6646393Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6646433Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6646651Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6646742Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6646781Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6646918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6646962Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6647000Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6647144Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6647225Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6647263Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6647464Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6647514Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6648695Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6648750Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6648791Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6648935Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6649012Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6649050Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6649167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6649232Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6649276Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6649404Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6649471Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6649513Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6649653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6649696Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6649733Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6649872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6649911Z return aot_autograd( 2025-09-07T07:34:42.6649946Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6650084Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6650172Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6650223Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6650429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6651483Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6651530Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6651714Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6651758Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6651943Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6651982Z fx_g = _create_graph( 2025-09-07T07:34:42.6652018Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6652213Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6652248Z fx_g = make_fx( 2025-09-07T07:34:42.6652281Z ^^^^^^^^ 2025-09-07T07:34:42.6652433Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6652479Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6652517Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6652664Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6652707Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6652744Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6652902Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6652944Z t = dispatch_trace( 2025-09-07T07:34:42.6652977Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6653091Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6653133Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6654143Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6654270Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6654328Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6654365Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6654526Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6654606Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6654647Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6654772Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6654814Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6654849Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6654973Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6655015Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6655049Z ^^^^^^^^^ 2025-09-07T07:34:42.6655228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6655271Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6655307Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6655457Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6655508Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6655541Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6655715Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6655780Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6655825Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6657048Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6657090Z outs_pair = fn(*args) 2025-09-07T07:34:42.6657127Z ^^^^^^^^^ 2025-09-07T07:34:42.6657300Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6657366Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6657412Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6657585Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6657681Z outs_pair = fn(*args) 2025-09-07T07:34:42.6657717Z ^^^^^^^^^ 2025-09-07T07:34:42.6657894Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6657954Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6657996Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6658192Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6658262Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6658308Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6658483Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6658526Z outs_pair = fn(*args) 2025-09-07T07:34:42.6658560Z ^^^^^^^^^ 2025-09-07T07:34:42.6658751Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6658795Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6658832Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6659180Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6660272Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6660310Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6660437Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6660479Z return handle_torch_function( 2025-09-07T07:34:42.6660519Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6660661Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6660735Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6660779Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6660947Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6660990Z return func(*args, **kwargs) 2025-09-07T07:34:42.6661026Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6661149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6661191Z result = _engine_run_backward( 2025-09-07T07:34:42.6661226Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6661397Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6661521Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6661570Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6661698Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6661739Z return user_fn(self, *args) 2025-09-07T07:34:42.6661777Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6661921Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6662931Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6662968Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6663127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6663189Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6663243Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6663450Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6663492Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6663527Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6663693Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6663745Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6663784Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6663920Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6663969Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6664010Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6664178Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6664224Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6664263Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6664423Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6664462Z t = dispatch_trace( 2025-09-07T07:34:42.6664511Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6664625Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6664666Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6665672Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6665797Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6665837Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6665874Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6666038Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6666116Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6666157Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6666281Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6666321Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6666355Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6666479Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6666581Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6666615Z ^^^^^^^^^ 2025-09-07T07:34:42.6666792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6666843Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6666877Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6666919Z File "", line 1, in 2025-09-07T07:34:42.6667064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6667141Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6667188Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6667323Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6668346Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6668385Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6668579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6668666Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6668702Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6668873Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6668917Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6668954Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6669098Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6669140Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6669175Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6669309Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6669398Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6669446Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6669570Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6669630Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6669673Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6669800Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6669857Z leaves = list(leaves) 2025-09-07T07:34:42.6669891Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6670014Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6670049Z return func(x) 2025-09-07T07:34:42.6671038Z ^^^^^^^ 2025-09-07T07:34:42.6671181Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6671250Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6671292Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6671458Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6671499Z return func(*args, **kwargs) 2025-09-07T07:34:42.6671534Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6671719Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6671804Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6671806Z 2025-09-07T07:34:42.6672013Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6672031Z 2025-09-07T07:34:42.6672035Z 2025-09-07T07:34:42.6672108Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6672305Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.6672307Z 2025-09-07T07:34:42.6672391Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6672465Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6672500Z inline_call [] 2025-09-07T07:34:42.6672557Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6672590Z inductor [] 2025-09-07T07:34:42.6672664Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6672736Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6673010Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6673135Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6674148Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6674301Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6674388Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6674520Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6674640Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6674750Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.6674794Z Traceback (most recent call last): 2025-09-07T07:34:42.6674947Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6674983Z self._run_test( 2025-09-07T07:34:42.6675095Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6675151Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6675190Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6675339Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6675385Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6675423Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6675572Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6675618Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6675660Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6675796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6675840Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6675877Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6676019Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6677146Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6677187Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6677341Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6677387Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6677568Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6677625Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6677664Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6677806Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6677856Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6677894Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6678014Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6678079Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6678122Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6678248Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6678311Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6678381Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6678538Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6678583Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6678619Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6678755Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6678795Z return aot_autograd( 2025-09-07T07:34:42.6679795Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6679931Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6680001Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6680046Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6680257Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6680342Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6680388Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6680572Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6680639Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6680824Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6680862Z fx_g = _create_graph( 2025-09-07T07:34:42.6680898Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6681060Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6681097Z fx_g = make_fx( 2025-09-07T07:34:42.6681129Z ^^^^^^^^ 2025-09-07T07:34:42.6681281Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6681326Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6681364Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6681510Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6681554Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6681589Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6681747Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6682750Z t = dispatch_trace( 2025-09-07T07:34:42.6682785Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6682915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6682961Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6682997Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6683122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6683161Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6683198Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6683360Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6683442Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6683481Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6683605Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6683644Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6683679Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6683831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6683873Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6683909Z ^^^^^^^^^ 2025-09-07T07:34:42.6684041Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6684082Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6684116Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6684266Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6684315Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6685305Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6685462Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6685525Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6685573Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6685749Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6685787Z outs_pair = fn(*args) 2025-09-07T07:34:42.6685822Z ^^^^^^^^^ 2025-09-07T07:34:42.6685992Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6686079Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6686123Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6686294Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6686332Z outs_pair = fn(*args) 2025-09-07T07:34:42.6686368Z ^^^^^^^^^ 2025-09-07T07:34:42.6686615Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6686676Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6686718Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6686912Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6686984Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6687029Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6687202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6687239Z outs_pair = fn(*args) 2025-09-07T07:34:42.6688269Z ^^^^^^^^^ 2025-09-07T07:34:42.6688464Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6688508Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6688544Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6688713Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6688761Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6688798Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6688922Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6688964Z return handle_torch_function( 2025-09-07T07:34:42.6688999Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6689142Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6689253Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6689300Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6689468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6689509Z return func(*args, **kwargs) 2025-09-07T07:34:42.6689544Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6689669Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6689710Z result = _engine_run_backward( 2025-09-07T07:34:42.6689746Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6689891Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6690014Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6690065Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6691152Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6691195Z return user_fn(self, *args) 2025-09-07T07:34:42.6691231Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6691374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6691445Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6691481Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6691639Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6691682Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6691719Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6691847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6691886Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6691922Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6692088Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6692140Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6692182Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6692319Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6692369Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6692408Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6692581Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6692629Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6692669Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6693787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6693826Z t = dispatch_trace( 2025-09-07T07:34:42.6693861Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6693974Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6694019Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6694055Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6694178Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6694216Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6694251Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6694411Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6694523Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6694564Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6694688Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6694726Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6694760Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6694888Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6694933Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6694974Z ^^^^^^^^^ 2025-09-07T07:34:42.6695122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6695171Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6695206Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6695251Z File "", line 1, in 2025-09-07T07:34:42.6696345Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6696425Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6696470Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6696664Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6696736Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6696775Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6696966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6697010Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6697047Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6697220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6697264Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6697301Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6697442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6697488Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6697523Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6697656Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6697743Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6697788Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6697936Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6698000Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6698042Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6699144Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6699184Z leaves = list(leaves) 2025-09-07T07:34:42.6699219Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6699345Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6699379Z return func(x) 2025-09-07T07:34:42.6699412Z ^^^^^^^ 2025-09-07T07:34:42.6699549Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6699614Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6699657Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6699863Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6699904Z return func(*args, **kwargs) 2025-09-07T07:34:42.6699940Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6700119Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6700204Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6700208Z 2025-09-07T07:34:42.6700414Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6700416Z 2025-09-07T07:34:42.6700418Z 2025-09-07T07:34:42.6700490Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6700689Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.6700693Z 2025-09-07T07:34:42.6700779Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6700853Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6700888Z inline_call [] 2025-09-07T07:34:42.6700944Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6700994Z inductor [] 2025-09-07T07:34:42.6702029Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6702103Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6702362Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6702476Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6702532Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6702682Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6702767Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6702900Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6703022Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6703093Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6703127Z inline_call [] 2025-09-07T07:34:42.6703183Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6703269Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6703341Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6703593Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6703703Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6703752Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6703905Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6703991Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6704121Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6704239Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6705283Z =================================== FAILURES =================================== 2025-09-07T07:34:42.6705397Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.6705441Z Traceback (most recent call last): 2025-09-07T07:34:42.6705591Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6705627Z self._run_test( 2025-09-07T07:34:42.6705740Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6705795Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6705835Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6705967Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6706012Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6706055Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6706204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6706251Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6706288Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6706423Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6706559Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6706596Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6706739Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6706819Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6706858Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6707011Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6707059Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6708178Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6708233Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6708273Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6708419Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6708469Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6708508Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6708623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6708725Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6708770Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6708898Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6708960Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6709001Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6709140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6709186Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6709223Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6709361Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6709399Z return aot_autograd( 2025-09-07T07:34:42.6709435Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6709570Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6709676Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6709722Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6710841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6710925Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6710971Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6711154Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6711196Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6711383Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6711425Z fx_g = _create_graph( 2025-09-07T07:34:42.6711462Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6711624Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6711659Z fx_g = make_fx( 2025-09-07T07:34:42.6711691Z ^^^^^^^^ 2025-09-07T07:34:42.6711842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6711909Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6711947Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6712092Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6712135Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6712171Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6712333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6712372Z t = dispatch_trace( 2025-09-07T07:34:42.6712406Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6712517Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6712559Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6713549Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6713676Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6713717Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6713754Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6713914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6713993Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6714049Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6714178Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6714216Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6714250Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6714376Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6714417Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6714455Z ^^^^^^^^^ 2025-09-07T07:34:42.6714587Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6714628Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6714662Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6714812Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6714861Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6714912Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6715082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6715146Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6715189Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6716371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6716412Z outs_pair = fn(*args) 2025-09-07T07:34:42.6716446Z ^^^^^^^^^ 2025-09-07T07:34:42.6716688Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6716756Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6716801Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6716978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6717017Z outs_pair = fn(*args) 2025-09-07T07:34:42.6717051Z ^^^^^^^^^ 2025-09-07T07:34:42.6717230Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6717324Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6717367Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6717561Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6717631Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6717678Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6717851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6717890Z outs_pair = fn(*args) 2025-09-07T07:34:42.6717924Z ^^^^^^^^^ 2025-09-07T07:34:42.6718113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6718159Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6718195Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6719371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6719418Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6719454Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6719604Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6719649Z return handle_torch_function( 2025-09-07T07:34:42.6719685Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6719826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6719900Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6719945Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6720115Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6720201Z return func(*args, **kwargs) 2025-09-07T07:34:42.6720236Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6720361Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6720402Z result = _engine_run_backward( 2025-09-07T07:34:42.6720458Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6720621Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6720743Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6720792Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6720918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6720960Z return user_fn(self, *args) 2025-09-07T07:34:42.6720997Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6721140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6722147Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6722186Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6722349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6722393Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6722429Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6722550Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6722590Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6722644Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6722808Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6722860Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6722901Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6723040Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6723089Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6723129Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6723290Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6723338Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6723377Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6723536Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6723574Z t = dispatch_trace( 2025-09-07T07:34:42.6723609Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6723720Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6724715Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6724752Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6724893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6724935Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6724970Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6725130Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6725208Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6725248Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6725374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6725412Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6725447Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6725575Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6725616Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6725667Z ^^^^^^^^^ 2025-09-07T07:34:42.6725829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6725878Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6725911Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6725954Z File "", line 1, in 2025-09-07T07:34:42.6726096Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6726176Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6726220Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6726355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6727434Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6727475Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6727671Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6727716Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6727751Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6727922Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6727998Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6728035Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6728177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6728220Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6728256Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6728391Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6728484Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6728530Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6728655Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6728715Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6728758Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6728888Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6728926Z leaves = list(leaves) 2025-09-07T07:34:42.6728960Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6729081Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6730076Z return func(x) 2025-09-07T07:34:42.6730131Z ^^^^^^^ 2025-09-07T07:34:42.6730272Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6730337Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6730377Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6730544Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6730587Z return func(*args, **kwargs) 2025-09-07T07:34:42.6730623Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6730808Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6730893Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6730895Z 2025-09-07T07:34:42.6731102Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6731137Z 2025-09-07T07:34:42.6731140Z 2025-09-07T07:34:42.6731214Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6731410Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.6731413Z 2025-09-07T07:34:42.6731498Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6731572Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6731608Z inline_call [] 2025-09-07T07:34:42.6731664Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6731698Z inductor [] 2025-09-07T07:34:42.6731772Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6731844Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6732102Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6732213Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6733223Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6733376Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6733480Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6733610Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6733729Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6733802Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6733839Z inline_call [] 2025-09-07T07:34:42.6733895Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6733966Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6734036Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6734288Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6734400Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6734450Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6734599Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6734705Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6734839Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6734956Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6735025Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6735059Z inline_call [] 2025-09-07T07:34:42.6735114Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6735185Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6736210Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6736463Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6736653Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6736732Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6736881Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6736965Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6737097Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6737218Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6737434Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-782e8bc3b1f5ef8d.xml - 2025-09-07T07:34:42.6737491Z =========================== short test summary info ============================ 2025-09-07T07:34:42.6737856Z FAILED [0.7753s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6737940Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6737943Z 2025-09-07T07:34:42.6738149Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6738173Z 2025-09-07T07:34:42.6738175Z 2025-09-07T07:34:42.6738246Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6738441Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.6738444Z 2025-09-07T07:34:42.6738528Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6738590Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.6738655Z ================== 1 failed, 245 deselected, 2 rerun in 2.91s ================== 2025-09-07T07:34:42.6738691Z Got exit code 1 2025-09-07T07:34:42.6738728Z Retrying single test... 2025-09-07T07:34:42.6739150Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.6739189Z import pkg_resources 2025-09-07T07:34:42.6740336Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-53eee21ef954e9db.xml 2025-09-07T07:34:42.6740414Z ============================= test session starts ============================== 2025-09-07T07:34:42.6740532Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.6740571Z cachedir: .pytest_cache 2025-09-07T07:34:42.6740728Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.6740773Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.6740812Z configfile: pytest.ini 2025-09-07T07:34:42.6740974Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.6741050Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.6741280Z stepcurrent: skipping 71 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.6741325Z Running 1 items in this shard 2025-09-07T07:34:42.6741346Z 2025-09-07T07:34:42.6741557Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [1.1824s] [100%] 2025-09-07T07:34:42.6741756Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.9397s] [100%] 2025-09-07T07:34:42.6741926Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True FAILED [0.9795s] [100%] 2025-09-07T07:34:42.6741929Z 2025-09-07T07:34:42.6741978Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.6742089Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.6742133Z Traceback (most recent call last): 2025-09-07T07:34:42.6742287Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6742326Z self._run_test( 2025-09-07T07:34:42.6742438Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6742493Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6743495Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6743631Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6743697Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6743736Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6743887Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6743934Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6743972Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6744111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6744158Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6744196Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6744339Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6744420Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6744461Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6744614Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6744659Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6744810Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6744863Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6744917Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6745062Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6745113Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6745152Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6746230Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6746300Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6746344Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6746470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6746596Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6746639Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6746831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6746876Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6746913Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6747051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6747090Z return aot_autograd( 2025-09-07T07:34:42.6747126Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6747263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6747334Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6747379Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6747540Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6747626Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6747673Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6747856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6747900Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6748084Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6748145Z fx_g = _create_graph( 2025-09-07T07:34:42.6749148Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6749313Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6749348Z fx_g = make_fx( 2025-09-07T07:34:42.6749381Z ^^^^^^^^ 2025-09-07T07:34:42.6749538Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6749584Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6749620Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6749766Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6749809Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6749848Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6750006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6750043Z t = dispatch_trace( 2025-09-07T07:34:42.6750077Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6750190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6750232Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6750288Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6750417Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6750457Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6750493Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6750653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6750733Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6750774Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6751859Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6751899Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6751934Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6752060Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6752120Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6752168Z ^^^^^^^^^ 2025-09-07T07:34:42.6752302Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6752342Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6752377Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6752526Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6752577Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6752611Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6752767Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6752829Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6752874Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6753054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6753093Z outs_pair = fn(*args) 2025-09-07T07:34:42.6753128Z ^^^^^^^^^ 2025-09-07T07:34:42.6753302Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6753369Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6753428Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6754555Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6754595Z outs_pair = fn(*args) 2025-09-07T07:34:42.6754630Z ^^^^^^^^^ 2025-09-07T07:34:42.6754807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6754871Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6754913Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6755107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6755177Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6755225Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6755397Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6755435Z outs_pair = fn(*args) 2025-09-07T07:34:42.6755469Z ^^^^^^^^^ 2025-09-07T07:34:42.6755676Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6755725Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6755762Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6755929Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6755976Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6756012Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6756141Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6756183Z return handle_torch_function( 2025-09-07T07:34:42.6756219Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6756360Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6757459Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6757559Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6757728Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6757768Z return func(*args, **kwargs) 2025-09-07T07:34:42.6757804Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6757926Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6757969Z result = _engine_run_backward( 2025-09-07T07:34:42.6758005Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6758151Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6758273Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6758324Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6758452Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6758493Z return user_fn(self, *args) 2025-09-07T07:34:42.6758530Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6758673Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6758717Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6758781Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6758939Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6758982Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6759019Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6759143Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6760194Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6760233Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6760401Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6760452Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6760492Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6760629Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6760681Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6760718Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6760881Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6760927Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6760987Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6761149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6761187Z t = dispatch_trace( 2025-09-07T07:34:42.6761221Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6761333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6761376Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6761413Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6761539Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6761578Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6761613Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6761774Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6761853Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6762897Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6763023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6763061Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6763096Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6763221Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6763263Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6763297Z ^^^^^^^^^ 2025-09-07T07:34:42.6763450Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6763497Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6763531Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6763572Z File "", line 1, in 2025-09-07T07:34:42.6763720Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6763797Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6763843Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6763978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6764025Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6764078Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6764270Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6764314Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6764349Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6764521Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6765536Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6765575Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6765719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6765761Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6765796Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6765933Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6766021Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6766068Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6766192Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6766270Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6766315Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6766444Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6766570Z leaves = list(leaves) 2025-09-07T07:34:42.6766605Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6766728Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6766764Z return func(x) 2025-09-07T07:34:42.6766796Z ^^^^^^^ 2025-09-07T07:34:42.6766934Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6766999Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6767040Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6767208Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6767298Z return func(*args, **kwargs) 2025-09-07T07:34:42.6768304Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6768486Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6768571Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6768573Z 2025-09-07T07:34:42.6768780Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6768783Z 2025-09-07T07:34:42.6768785Z 2025-09-07T07:34:42.6768856Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6769054Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.6769059Z 2025-09-07T07:34:42.6769144Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6769219Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6769253Z inline_call [] 2025-09-07T07:34:42.6769310Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6769343Z inductor [] 2025-09-07T07:34:42.6769417Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6769514Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6769775Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6769886Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6769939Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6770093Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6770179Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6770310Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6770430Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6770542Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.6771546Z Traceback (most recent call last): 2025-09-07T07:34:42.6771698Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6771734Z self._run_test( 2025-09-07T07:34:42.6771866Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6771926Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6771966Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6772097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6772143Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6772181Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6772333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6772378Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6772418Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6772552Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6772596Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6772650Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6772813Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6772893Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6772932Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6773083Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6773130Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6773281Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6773333Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6774332Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6774476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6774531Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6774570Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6774686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6774751Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6774794Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6774940Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6775001Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6775045Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6775183Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6775230Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6775269Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6775406Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6775446Z return aot_autograd( 2025-09-07T07:34:42.6775481Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6775620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6775690Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6775735Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6775895Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6775978Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6776041Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6777254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6777297Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6777482Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6777521Z fx_g = _create_graph( 2025-09-07T07:34:42.6777560Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6777722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6777757Z fx_g = make_fx( 2025-09-07T07:34:42.6777789Z ^^^^^^^^ 2025-09-07T07:34:42.6777940Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6777986Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6778053Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6778217Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6778260Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6778296Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6778455Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6778493Z t = dispatch_trace( 2025-09-07T07:34:42.6778528Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6778642Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6778683Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6778719Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6778844Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6779850Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6779889Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6780051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6780128Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6780169Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6780292Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6780356Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6780390Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6780516Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6780557Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6780592Z ^^^^^^^^^ 2025-09-07T07:34:42.6780725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6780768Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6780802Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6780951Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6780999Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6781033Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6781192Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6781254Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6781298Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6781490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6781532Z outs_pair = fn(*args) 2025-09-07T07:34:42.6782526Z ^^^^^^^^^ 2025-09-07T07:34:42.6782702Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6782768Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6782812Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6782988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6783029Z outs_pair = fn(*args) 2025-09-07T07:34:42.6783063Z ^^^^^^^^^ 2025-09-07T07:34:42.6783239Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6783299Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6783359Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6783567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6783639Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6783684Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6783857Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6783897Z outs_pair = fn(*args) 2025-09-07T07:34:42.6783931Z ^^^^^^^^^ 2025-09-07T07:34:42.6784122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6784167Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6784204Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6784378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6784423Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6785414Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6785541Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6785601Z return handle_torch_function( 2025-09-07T07:34:42.6785637Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6785779Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6785852Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6785897Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6786064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6786109Z return func(*args, **kwargs) 2025-09-07T07:34:42.6786144Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6786267Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6786308Z result = _engine_run_backward( 2025-09-07T07:34:42.6786344Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6786556Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6786676Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6786724Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6786876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6786919Z return user_fn(self, *args) 2025-09-07T07:34:42.6786956Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6787102Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6787144Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6787181Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6788308Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6788354Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6788390Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6788514Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6788553Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6788589Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6788780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6788851Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6788889Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6789026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6789074Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6789113Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6789275Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6789321Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6789359Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6789520Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6789560Z t = dispatch_trace( 2025-09-07T07:34:42.6789595Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6789709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6789751Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6789787Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6790866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6790929Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6790963Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6791123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6791202Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6791242Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6791366Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6791409Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6791442Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6791569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6791610Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6791644Z ^^^^^^^^^ 2025-09-07T07:34:42.6791793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6791845Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6791878Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6791919Z File "", line 1, in 2025-09-07T07:34:42.6792064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6792155Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6792203Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6792340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6792387Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6792425Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6793571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6793618Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6793653Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6793825Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6793868Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6793923Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6794079Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6794122Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6794157Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6794291Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6794378Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6794424Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6794549Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6794608Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6794651Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6794777Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6794818Z leaves = list(leaves) 2025-09-07T07:34:42.6794852Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6794975Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6795009Z return func(x) 2025-09-07T07:34:42.6795042Z ^^^^^^^ 2025-09-07T07:34:42.6795179Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6796224Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6796266Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6796434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6796474Z return func(*args, **kwargs) 2025-09-07T07:34:42.6796573Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6796758Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6796843Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6796845Z 2025-09-07T07:34:42.6797051Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6797056Z 2025-09-07T07:34:42.6797058Z 2025-09-07T07:34:42.6797131Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6797327Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.6797330Z 2025-09-07T07:34:42.6797415Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6797514Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6797553Z inline_call [] 2025-09-07T07:34:42.6797609Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6797643Z inductor [] 2025-09-07T07:34:42.6797716Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6797788Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6798045Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6798158Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6798209Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6798360Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6799461Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6799596Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6799715Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6799787Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6799823Z inline_call [] 2025-09-07T07:34:42.6799878Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6799950Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6800019Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6800335Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6800449Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6800499Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6800650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6800734Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6800884Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6801001Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6801051Z =================================== FAILURES =================================== 2025-09-07T07:34:42.6801162Z _ WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True _ 2025-09-07T07:34:42.6801208Z Traceback (most recent call last): 2025-09-07T07:34:42.6801360Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1319, in test_while_loop_with_pytree_inputs 2025-09-07T07:34:42.6801396Z self._run_test( 2025-09-07T07:34:42.6801509Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6802536Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6802577Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6802713Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6802759Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6802798Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6802948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6803010Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6803051Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6803187Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6803232Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6803269Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6803412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6803495Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6803533Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6803685Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6803731Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6803882Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6803964Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6804005Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6804148Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6804198Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6804237Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6805308Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6805375Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6805418Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6805545Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6805608Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6805654Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6805792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6805836Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6805872Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6806008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6806067Z return aot_autograd( 2025-09-07T07:34:42.6806101Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6806238Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6806306Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6806352Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6806580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6806666Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6806711Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6806897Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6806941Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6807127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6807166Z fx_g = _create_graph( 2025-09-07T07:34:42.6808170Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6808359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6808396Z fx_g = make_fx( 2025-09-07T07:34:42.6808429Z ^^^^^^^^ 2025-09-07T07:34:42.6808582Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6808628Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6808666Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6808812Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6808857Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6808893Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6809052Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6809089Z t = dispatch_trace( 2025-09-07T07:34:42.6809124Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6809238Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6809327Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6809363Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6809487Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6809527Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6809563Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6809724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6809802Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6810801Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6810927Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6810966Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6811002Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6811132Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6811172Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6811207Z ^^^^^^^^^ 2025-09-07T07:34:42.6811339Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6811379Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6811439Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6811588Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6811636Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6811670Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6811826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6811890Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6811936Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6812114Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6812153Z outs_pair = fn(*args) 2025-09-07T07:34:42.6812188Z ^^^^^^^^^ 2025-09-07T07:34:42.6812359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6812427Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6812471Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6813598Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6813638Z outs_pair = fn(*args) 2025-09-07T07:34:42.6813690Z ^^^^^^^^^ 2025-09-07T07:34:42.6813870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6813930Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6813972Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6814167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6814239Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6814283Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6814457Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6814494Z outs_pair = fn(*args) 2025-09-07T07:34:42.6814530Z ^^^^^^^^^ 2025-09-07T07:34:42.6814758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6814804Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6814840Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6815010Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6815057Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6815094Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6815219Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6815262Z return handle_torch_function( 2025-09-07T07:34:42.6815298Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6816393Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6816471Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6816589Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6816756Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6816797Z return func(*args, **kwargs) 2025-09-07T07:34:42.6816832Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6816985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6817026Z result = _engine_run_backward( 2025-09-07T07:34:42.6817062Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6817208Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6817332Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6817384Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6817511Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6817551Z return user_fn(self, *args) 2025-09-07T07:34:42.6817588Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6817730Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6817776Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6817813Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6817970Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6818014Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6818067Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6818194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6819201Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6819239Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6819404Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6819456Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6819497Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6819634Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6819682Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6819720Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6819881Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6819952Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6820009Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6820169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6820207Z t = dispatch_trace( 2025-09-07T07:34:42.6820241Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6820355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6820399Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6820435Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6820559Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6820598Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6820632Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6820796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6821832Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6821874Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6821998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6822036Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6822069Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6822214Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6822255Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6822290Z ^^^^^^^^^ 2025-09-07T07:34:42.6822439Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6822487Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6822524Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6822567Z File "", line 1, in 2025-09-07T07:34:42.6822709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6822786Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6822831Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6822968Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6823017Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6823055Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6823247Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6823290Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6823340Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6823514Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6824509Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6824547Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6824690Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6824737Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6824772Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6824907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6824993Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6825040Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6825181Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6825257Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6825299Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6825424Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6825463Z leaves = list(leaves) 2025-09-07T07:34:42.6825497Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6825623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6825658Z return func(x) 2025-09-07T07:34:42.6825692Z ^^^^^^^ 2025-09-07T07:34:42.6825830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6825896Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6825938Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6826107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6826147Z return func(*args, **kwargs) 2025-09-07T07:34:42.6827209Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6827391Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6827503Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6827506Z 2025-09-07T07:34:42.6827710Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6827713Z 2025-09-07T07:34:42.6827714Z 2025-09-07T07:34:42.6827787Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6827986Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.6827991Z 2025-09-07T07:34:42.6828076Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6828148Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6828184Z inline_call [] 2025-09-07T07:34:42.6828240Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6828278Z inductor [] 2025-09-07T07:34:42.6828351Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6828423Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6828680Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6828809Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6828864Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6829016Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6829102Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6829232Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6829352Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6829423Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6830418Z inline_call [] 2025-09-07T07:34:42.6830475Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6830548Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6830659Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6830913Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6831024Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6831073Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6831225Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6831310Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6831441Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6831561Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6831634Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6831668Z inline_call [] 2025-09-07T07:34:42.6831723Z stats [('calls_captured', 10), ('unique_graphs', 1)] 2025-09-07T07:34:42.6831793Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6831862Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6832130Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6832237Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 873, in forward 2025-09-07T07:34:42.6832286Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6832436Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6832522Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6833606Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6833725Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6833941Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-53eee21ef954e9db.xml - 2025-09-07T07:34:42.6834002Z =========================== short test summary info ============================ 2025-09-07T07:34:42.6834363Z FAILED [0.9795s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6834463Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6834468Z 2025-09-07T07:34:42.6834674Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6834676Z 2025-09-07T07:34:42.6834678Z 2025-09-07T07:34:42.6834750Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6834947Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True 2025-09-07T07:34:42.6834950Z 2025-09-07T07:34:42.6835034Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6835093Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.6835158Z ================== 1 failed, 245 deselected, 2 rerun in 3.38s ================== 2025-09-07T07:34:42.6835215Z Got exit code 1 2025-09-07T07:34:42.6835352Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-09-07T07:34:42.6835775Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.6835815Z import pkg_resources 2025-09-07T07:34:42.6835985Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-0ac1ceff773feb6b.xml 2025-09-07T07:34:42.6836039Z ============================= test session starts ============================== 2025-09-07T07:34:42.6836152Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.6836191Z cachedir: .pytest_cache 2025-09-07T07:34:42.6836350Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.6836394Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.6837466Z configfile: pytest.ini 2025-09-07T07:34:42.6837629Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.6837705Z collecting ... collected 467 items / 72 deselected / 395 selected 2025-09-07T07:34:42.6837785Z stepcurrent: skipping 72 already run items. 2025-09-07T07:34:42.6837827Z Running 174 items in this shard 2025-09-07T07:34:42.6837830Z 2025-09-07T07:34:42.6838003Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_False PASSED [1.4870s] [ 0%] 2025-09-07T07:34:42.6838199Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.2673s] [ 1%] 2025-09-07T07:34:42.6838394Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.5281s] [ 1%] 2025-09-07T07:34:42.6838560Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True FAILED [0.2301s] [ 1%] 2025-09-07T07:34:42.6838563Z 2025-09-07T07:34:42.6838610Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.6838721Z _ WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.6838764Z Traceback (most recent call last): 2025-09-07T07:34:42.6838915Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1489, in test_while_loop_with_sym_expr_cond 2025-09-07T07:34:42.6838949Z self._run_test( 2025-09-07T07:34:42.6839080Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6839137Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6839180Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6839314Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6839360Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6839399Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6839554Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6840618Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6840659Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6840796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6840841Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6840904Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6841067Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6841149Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6841187Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6841340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6841388Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6841538Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6841590Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6841631Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6841773Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6841828Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6841866Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6841983Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6842047Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6842092Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6842235Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6842299Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6842339Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6843445Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6843491Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6843531Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6843670Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6843710Z return aot_autograd( 2025-09-07T07:34:42.6843745Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6843882Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6843953Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6843999Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6844159Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6844242Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6844301Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6844488Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6844531Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6844719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6844759Z fx_g = _create_graph( 2025-09-07T07:34:42.6844795Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6844958Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6844992Z fx_g = make_fx( 2025-09-07T07:34:42.6845024Z ^^^^^^^^ 2025-09-07T07:34:42.6845176Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6846178Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6846235Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6846397Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6846440Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6846477Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6846699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6846738Z t = dispatch_trace( 2025-09-07T07:34:42.6846772Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6846885Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6846926Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6846962Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6847086Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6847129Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6847166Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6847329Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6847407Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6847449Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6847575Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6847639Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6847673Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6847801Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6847843Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6848848Z ^^^^^^^^^ 2025-09-07T07:34:42.6848986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6849027Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6849062Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6849210Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6849259Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6849293Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6849452Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6849514Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6849558Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6849753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6849796Z outs_pair = fn(*args) 2025-09-07T07:34:42.6849832Z ^^^^^^^^^ 2025-09-07T07:34:42.6850005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6850071Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6850116Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6850288Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6850329Z outs_pair = fn(*args) 2025-09-07T07:34:42.6850363Z ^^^^^^^^^ 2025-09-07T07:34:42.6850541Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6850602Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6851617Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6851832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6851903Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6851949Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6852123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6852163Z outs_pair = fn(*args) 2025-09-07T07:34:42.6852197Z ^^^^^^^^^ 2025-09-07T07:34:42.6852386Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6852431Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6852471Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6852640Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6852685Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6852721Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6852847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6852905Z return handle_torch_function( 2025-09-07T07:34:42.6852940Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6853082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6853156Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6853201Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6853369Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6853411Z return func(*args, **kwargs) 2025-09-07T07:34:42.6853447Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6854525Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6854568Z result = _engine_run_backward( 2025-09-07T07:34:42.6854603Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6854756Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6854878Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6854928Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6855073Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6855117Z return user_fn(self, *args) 2025-09-07T07:34:42.6855155Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6855300Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6855342Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6855379Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6855536Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6855581Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6855617Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6855740Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6855779Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6855815Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6855998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6856065Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6856104Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6857266Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6857317Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6857356Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6857517Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6857565Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6857603Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6857763Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6857804Z t = dispatch_trace( 2025-09-07T07:34:42.6857840Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6857952Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6857995Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6858032Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6858154Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6858221Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6858255Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6858416Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6858494Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6858534Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6858659Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6858699Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6858732Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6858858Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6859858Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6859894Z ^^^^^^^^^ 2025-09-07T07:34:42.6860044Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6860097Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6860129Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6860172Z File "", line 1, in 2025-09-07T07:34:42.6860314Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6860412Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6860461Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6860598Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6860645Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6860682Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6860875Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6860920Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6860956Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6861126Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6861171Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6861233Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6861394Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6861436Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6861472Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6861607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6862649Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6862696Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6862822Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6862882Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6862924Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6863052Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6863093Z leaves = list(leaves) 2025-09-07T07:34:42.6863127Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6863250Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6863285Z return func(x) 2025-09-07T07:34:42.6863318Z ^^^^^^^ 2025-09-07T07:34:42.6863455Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6863539Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6863580Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6863749Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6863790Z return func(*args, **kwargs) 2025-09-07T07:34:42.6863827Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6864009Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6864094Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6864097Z 2025-09-07T07:34:42.6864304Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6864308Z 2025-09-07T07:34:42.6864309Z 2025-09-07T07:34:42.6864381Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6865528Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.6865531Z 2025-09-07T07:34:42.6865619Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6865710Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6865748Z inline_call [] 2025-09-07T07:34:42.6865804Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.6865879Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6865949Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6866209Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6866326Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.6866376Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6866591Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6866701Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6866850Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6866972Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6867079Z _ WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.6867122Z Traceback (most recent call last): 2025-09-07T07:34:42.6867272Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1489, in test_while_loop_with_sym_expr_cond 2025-09-07T07:34:42.6867307Z self._run_test( 2025-09-07T07:34:42.6867420Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6867474Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6867516Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6867650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6868670Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6868709Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6868859Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6868905Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6868968Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6869103Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6869146Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6869183Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6869328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6869410Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6869450Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6869602Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6869648Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6869799Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6869855Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6869894Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6870038Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6870088Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6870127Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6870261Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6870326Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6871326Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6871454Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6871517Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6871561Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6871699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6871742Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6871780Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6871918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6871975Z return aot_autograd( 2025-09-07T07:34:42.6872024Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6872160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6872228Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6872274Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6872434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6872517Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6872561Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6872746Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6872791Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6872978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6873016Z fx_g = _create_graph( 2025-09-07T07:34:42.6873052Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6873213Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6874224Z fx_g = make_fx( 2025-09-07T07:34:42.6874257Z ^^^^^^^^ 2025-09-07T07:34:42.6874409Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6874454Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6874492Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6874639Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6874684Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6874721Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6874880Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6874917Z t = dispatch_trace( 2025-09-07T07:34:42.6874950Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6875064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6875108Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6875145Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6875268Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6875309Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6875344Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6875520Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6875602Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6875643Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6875766Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6876814Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6876850Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6876980Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6877021Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6877056Z ^^^^^^^^^ 2025-09-07T07:34:42.6877187Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6877228Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6877263Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6877455Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6877504Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6877539Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6877696Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6877758Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6877804Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6877979Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6878017Z outs_pair = fn(*args) 2025-09-07T07:34:42.6878051Z ^^^^^^^^^ 2025-09-07T07:34:42.6878225Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6878294Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6878340Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6878512Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6878551Z outs_pair = fn(*args) 2025-09-07T07:34:42.6879571Z ^^^^^^^^^ 2025-09-07T07:34:42.6879751Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6879811Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6879854Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6880051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6880125Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6880224Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6880398Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6880436Z outs_pair = fn(*args) 2025-09-07T07:34:42.6880470Z ^^^^^^^^^ 2025-09-07T07:34:42.6880662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6880707Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6880743Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6880933Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6880981Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6881020Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6881147Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6881190Z return handle_torch_function( 2025-09-07T07:34:42.6881226Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6881368Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6881442Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6882445Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6882613Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6882654Z return func(*args, **kwargs) 2025-09-07T07:34:42.6882690Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6882848Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6882889Z result = _engine_run_backward( 2025-09-07T07:34:42.6882925Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6883072Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6883192Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6883242Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6883367Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6883409Z return user_fn(self, *args) 2025-09-07T07:34:42.6883444Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6883591Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6883638Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6883675Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6883831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6883875Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6883911Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6884049Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6884088Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6884125Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6885240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6885294Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6885336Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6885475Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6885524Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6885562Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6885723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6885775Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6885813Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6885973Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6886010Z t = dispatch_trace( 2025-09-07T07:34:42.6886045Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6886172Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6886219Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6886254Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6886378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6886418Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6886453Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6886681Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6886760Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6886801Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6886925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6887932Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6887968Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6888136Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6888178Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6888213Z ^^^^^^^^^ 2025-09-07T07:34:42.6888362Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6888412Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6888446Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6888488Z File "", line 1, in 2025-09-07T07:34:42.6888631Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6888708Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6888752Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6888890Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6888941Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6888979Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6889172Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6889215Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6889250Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6889446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6889491Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6889527Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6890629Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6890676Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6890712Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6890846Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6890934Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6890979Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6891105Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6891167Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6891211Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6891335Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6891374Z leaves = list(leaves) 2025-09-07T07:34:42.6891430Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6891556Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6891591Z return func(x) 2025-09-07T07:34:42.6891624Z ^^^^^^^ 2025-09-07T07:34:42.6891761Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6891827Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6891868Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6892036Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6892076Z return func(*args, **kwargs) 2025-09-07T07:34:42.6892113Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6892291Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6893344Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6893372Z 2025-09-07T07:34:42.6893580Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6893582Z 2025-09-07T07:34:42.6893584Z 2025-09-07T07:34:42.6893656Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6893850Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.6893853Z 2025-09-07T07:34:42.6893938Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6894012Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6894047Z inline_call [] 2025-09-07T07:34:42.6894104Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.6894180Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6894251Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6894507Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6894619Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.6894685Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6894835Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6894921Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6895054Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6895177Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6895247Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6895282Z inline_call [] 2025-09-07T07:34:42.6895337Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.6895409Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6896434Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6896754Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6896865Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.6896916Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6897091Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6897176Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6897305Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6897424Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6897474Z =================================== FAILURES =================================== 2025-09-07T07:34:42.6897583Z _ WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.6897626Z Traceback (most recent call last): 2025-09-07T07:34:42.6897775Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1489, in test_while_loop_with_sym_expr_cond 2025-09-07T07:34:42.6897810Z self._run_test( 2025-09-07T07:34:42.6897965Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6898021Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6898061Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6898195Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6898240Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6898280Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6898431Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6898478Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6898516Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6899633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6899679Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6899722Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6899864Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6899946Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6899984Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6900135Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6900204Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6900354Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6900407Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6900448Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6900591Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6900645Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6900683Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6900798Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6900864Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6900908Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6901034Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6901098Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6901139Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6901291Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6902293Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6902334Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6902471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6902511Z return aot_autograd( 2025-09-07T07:34:42.6902546Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6902681Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6902752Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6902798Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6902959Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6903041Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6903105Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6903302Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6903345Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6903531Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6903572Z fx_g = _create_graph( 2025-09-07T07:34:42.6903607Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6903770Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6903805Z fx_g = make_fx( 2025-09-07T07:34:42.6903838Z ^^^^^^^^ 2025-09-07T07:34:42.6903990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6904038Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6904076Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6905187Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6905230Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6905268Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6905425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6905482Z t = dispatch_trace( 2025-09-07T07:34:42.6905516Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6905629Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6905670Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6905706Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6905831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6905875Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6905911Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6906072Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6906151Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6906191Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6906318Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6906356Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6906391Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6906578Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6906620Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6906676Z ^^^^^^^^^ 2025-09-07T07:34:42.6907780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6907822Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6907857Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6908006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6908055Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6908092Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6908249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6908310Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6908354Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6908529Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6908608Z outs_pair = fn(*args) 2025-09-07T07:34:42.6908643Z ^^^^^^^^^ 2025-09-07T07:34:42.6908815Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6908880Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6908924Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6909098Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6909136Z outs_pair = fn(*args) 2025-09-07T07:34:42.6909171Z ^^^^^^^^^ 2025-09-07T07:34:42.6909347Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6909408Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6909454Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6909648Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6910672Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6910719Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6910916Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6910955Z outs_pair = fn(*args) 2025-09-07T07:34:42.6910989Z ^^^^^^^^^ 2025-09-07T07:34:42.6911180Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6911226Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6911266Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6911435Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6911481Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6911518Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6911644Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6911688Z return handle_torch_function( 2025-09-07T07:34:42.6911725Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6911865Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6911940Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6911985Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6912174Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6912216Z return func(*args, **kwargs) 2025-09-07T07:34:42.6912252Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6912375Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6913373Z result = _engine_run_backward( 2025-09-07T07:34:42.6913409Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6913558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6913678Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6913728Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6913855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6913935Z return user_fn(self, *args) 2025-09-07T07:34:42.6913972Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6914117Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6914160Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6914196Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6914354Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6914400Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6914437Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6914559Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6914599Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6914637Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6914805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6914857Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6914897Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6915032Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6915080Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6916086Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6916248Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6916295Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6916334Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6916542Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6916584Z t = dispatch_trace( 2025-09-07T07:34:42.6916619Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6916732Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6916774Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6916810Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6916932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6916974Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6917008Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6917169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6917248Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6917313Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6917441Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6917478Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6917513Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6917637Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6917679Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6917713Z ^^^^^^^^^ 2025-09-07T07:34:42.6918833Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6918882Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6918917Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6918958Z File "", line 1, in 2025-09-07T07:34:42.6919102Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6919222Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6919268Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6919403Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6919450Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6919487Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6919681Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6919723Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6919759Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6919930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6919975Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6920013Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6920202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6920245Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6920280Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6920416Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6920526Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6921535Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6921661Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6921721Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6921764Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6921893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6921931Z leaves = list(leaves) 2025-09-07T07:34:42.6921966Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6922090Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6922125Z return func(x) 2025-09-07T07:34:42.6922157Z ^^^^^^^ 2025-09-07T07:34:42.6922297Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6922360Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6922401Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6922567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6922623Z return func(*args, **kwargs) 2025-09-07T07:34:42.6922660Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6922844Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6922929Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6922931Z 2025-09-07T07:34:42.6923139Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6923142Z 2025-09-07T07:34:42.6923144Z 2025-09-07T07:34:42.6923215Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6923412Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.6923414Z 2025-09-07T07:34:42.6923499Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6924570Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6924606Z inline_call [] 2025-09-07T07:34:42.6924662Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.6924735Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6924807Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6925063Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6925177Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.6925227Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6925378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6925467Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6925598Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6925717Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6925787Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6925838Z inline_call [] 2025-09-07T07:34:42.6925893Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.6925966Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6926035Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6926289Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6926404Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.6926453Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6926675Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6927730Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6927862Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6927980Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6928050Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6928084Z inline_call [] 2025-09-07T07:34:42.6928139Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.6928235Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6928307Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6928558Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6928669Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.6928720Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6928867Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6928951Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6929081Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6929235Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6929453Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-0ac1ceff773feb6b.xml - 2025-09-07T07:34:42.6929510Z =========================== short test summary info ============================ 2025-09-07T07:34:42.6929874Z FAILED [0.2301s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6929959Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6929961Z 2025-09-07T07:34:42.6930169Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6930172Z 2025-09-07T07:34:42.6930173Z 2025-09-07T07:34:42.6930245Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6931403Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.6931406Z 2025-09-07T07:34:42.6931490Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6931578Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.6931648Z ============= 1 failed, 1 passed, 72 deselected, 2 rerun in 2.82s ============== 2025-09-07T07:34:42.6931684Z Got exit code 1 2025-09-07T07:34:42.6931722Z Retrying single test... 2025-09-07T07:34:42.6932150Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.6932191Z import pkg_resources 2025-09-07T07:34:42.6932360Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-6667f56e9c45294a.xml 2025-09-07T07:34:42.6932414Z ============================= test session starts ============================== 2025-09-07T07:34:42.6932531Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.6932569Z cachedir: .pytest_cache 2025-09-07T07:34:42.6932727Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.6932771Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.6932810Z configfile: pytest.ini 2025-09-07T07:34:42.6932989Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.6933069Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.6933298Z stepcurrent: skipping 73 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.6933341Z Running 1 items in this shard 2025-09-07T07:34:42.6933343Z 2025-09-07T07:34:42.6933538Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.4242s] [100%] 2025-09-07T07:34:42.6933732Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.2437s] [100%] 2025-09-07T07:34:42.6933899Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True FAILED [0.2302s] [100%] 2025-09-07T07:34:42.6933920Z 2025-09-07T07:34:42.6934950Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.6935060Z _ WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.6935103Z Traceback (most recent call last): 2025-09-07T07:34:42.6935255Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1489, in test_while_loop_with_sym_expr_cond 2025-09-07T07:34:42.6935293Z self._run_test( 2025-09-07T07:34:42.6935405Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6935461Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6935501Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6935636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6935683Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6935725Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6935878Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6935925Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6935963Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6936099Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6936161Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6936199Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6936341Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6936421Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6936461Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6936677Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6937688Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6937841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6937893Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6937933Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6938079Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6938130Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6938169Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6938285Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6938379Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6938426Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6938553Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6938615Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6938658Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6938797Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6938842Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6938879Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6939018Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6939057Z return aot_autograd( 2025-09-07T07:34:42.6939093Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6939229Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6939336Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6939382Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6940505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6940589Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6940635Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6940818Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6940862Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6941048Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6941092Z fx_g = _create_graph( 2025-09-07T07:34:42.6941127Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6941290Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6941324Z fx_g = make_fx( 2025-09-07T07:34:42.6941357Z ^^^^^^^^ 2025-09-07T07:34:42.6941508Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6941578Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6941616Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6941762Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6941805Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6941843Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6942006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6942043Z t = dispatch_trace( 2025-09-07T07:34:42.6942077Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6942189Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6943185Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6943222Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6943351Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6943391Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6943428Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6943588Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6943685Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6943727Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6943853Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6943891Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6943926Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6944051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6944093Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6944128Z ^^^^^^^^^ 2025-09-07T07:34:42.6944260Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6944301Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6944336Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6944486Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6944550Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6944598Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6944755Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6944817Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6945815Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6945995Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6946035Z outs_pair = fn(*args) 2025-09-07T07:34:42.6946070Z ^^^^^^^^^ 2025-09-07T07:34:42.6946241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6946308Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6946356Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6946595Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6946633Z outs_pair = fn(*args) 2025-09-07T07:34:42.6946668Z ^^^^^^^^^ 2025-09-07T07:34:42.6946843Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6946930Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6946972Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6947171Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6947241Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6947289Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6947466Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6947505Z outs_pair = fn(*args) 2025-09-07T07:34:42.6947538Z ^^^^^^^^^ 2025-09-07T07:34:42.6947728Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6947773Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6948771Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6948942Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6948988Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6949024Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6949173Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6949218Z return handle_torch_function( 2025-09-07T07:34:42.6949255Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6949396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6949470Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6949515Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6949684Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6949725Z return func(*args, **kwargs) 2025-09-07T07:34:42.6949760Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6949884Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6949927Z result = _engine_run_backward( 2025-09-07T07:34:42.6949985Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6950148Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6950270Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6950317Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6950445Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6950486Z return user_fn(self, *args) 2025-09-07T07:34:42.6950522Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6951625Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6951670Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6951707Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6951870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6951914Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6951951Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6952073Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6952114Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6952175Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6952340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6952391Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6952431Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6952569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6952620Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6952659Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6952821Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6952869Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6952907Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6953065Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6953104Z t = dispatch_trace( 2025-09-07T07:34:42.6953138Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6953250Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6954255Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6954291Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6954432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6954472Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6954507Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6954668Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6954747Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6954790Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6954915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6954952Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6954987Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6955113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6955156Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6955204Z ^^^^^^^^^ 2025-09-07T07:34:42.6955369Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6955418Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6955451Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6955493Z File "", line 1, in 2025-09-07T07:34:42.6955636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6955714Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6955758Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6956918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6956967Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6957007Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6957204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6957247Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6957282Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6957454Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6957525Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6957562Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6957705Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6957747Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6957782Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6957918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6958010Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6958056Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6958180Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6958240Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6958284Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6958409Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6958447Z leaves = list(leaves) 2025-09-07T07:34:42.6958481Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6958603Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6959618Z return func(x) 2025-09-07T07:34:42.6959653Z ^^^^^^^ 2025-09-07T07:34:42.6959792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6959857Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6959899Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6960066Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6960108Z return func(*args, **kwargs) 2025-09-07T07:34:42.6960182Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6960363Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6960448Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6960450Z 2025-09-07T07:34:42.6960674Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6960696Z 2025-09-07T07:34:42.6960697Z 2025-09-07T07:34:42.6960770Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6960966Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.6960968Z 2025-09-07T07:34:42.6961054Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6961129Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6961163Z inline_call [] 2025-09-07T07:34:42.6961221Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.6961254Z inductor [] 2025-09-07T07:34:42.6961326Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6961399Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6961659Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6962738Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.6962791Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6962969Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6963055Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6963185Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6963305Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6963417Z _ WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.6963460Z Traceback (most recent call last): 2025-09-07T07:34:42.6963609Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1489, in test_while_loop_with_sym_expr_cond 2025-09-07T07:34:42.6963643Z self._run_test( 2025-09-07T07:34:42.6963756Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6963813Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6963854Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6963985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6964031Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6964069Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6964234Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6964283Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6964323Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6964458Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6964503Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6964539Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6965645Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6965726Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6965765Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6965916Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6965962Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6966142Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6966196Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6966236Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6966376Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6966429Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6966468Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6966713Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6966779Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6966823Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6966950Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6967018Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6967063Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6967205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6967248Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6967286Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6967451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6968470Z return aot_autograd( 2025-09-07T07:34:42.6968507Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6968644Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6968714Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6968762Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6968924Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6969007Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6969051Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6969233Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.6969277Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.6969464Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.6969503Z fx_g = _create_graph( 2025-09-07T07:34:42.6969538Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6969723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.6969758Z fx_g = make_fx( 2025-09-07T07:34:42.6969790Z ^^^^^^^^ 2025-09-07T07:34:42.6969941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.6969986Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.6970024Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6970173Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.6970215Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.6970251Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6971362Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6971402Z t = dispatch_trace( 2025-09-07T07:34:42.6971459Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6971597Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6971639Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6971675Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6971799Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6971840Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6971878Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6972038Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6972116Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6972157Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6972281Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6972322Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6972356Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6972482Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6972523Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6972558Z ^^^^^^^^^ 2025-09-07T07:34:42.6972690Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.6972747Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.6972781Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6973884Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6973934Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6973968Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6974129Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.6974194Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.6974237Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6974412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6974450Z outs_pair = fn(*args) 2025-09-07T07:34:42.6974487Z ^^^^^^^^^ 2025-09-07T07:34:42.6974658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.6974724Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.6974768Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6974955Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6974998Z outs_pair = fn(*args) 2025-09-07T07:34:42.6975032Z ^^^^^^^^^ 2025-09-07T07:34:42.6975209Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.6975268Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.6975311Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6975508Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.6975578Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.6975623Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6975796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.6977016Z outs_pair = fn(*args) 2025-09-07T07:34:42.6977054Z ^^^^^^^^^ 2025-09-07T07:34:42.6977247Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6977292Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6977328Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6977500Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.6977545Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.6977582Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6977707Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.6977750Z return handle_torch_function( 2025-09-07T07:34:42.6977789Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6977934Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.6978008Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.6978054Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6978220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6978281Z return func(*args, **kwargs) 2025-09-07T07:34:42.6978317Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6978441Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.6978482Z result = _engine_run_backward( 2025-09-07T07:34:42.6978517Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6978665Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.6978787Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6979888Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6980015Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.6980057Z return user_fn(self, *args) 2025-09-07T07:34:42.6980094Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6980240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.6980282Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.6980319Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6980500Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.6980546Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.6980584Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6980706Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6980745Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6980782Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6980947Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.6981000Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.6981039Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6981177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.6981226Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.6981266Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6981465Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.6981513Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.6982513Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6982673Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.6982711Z t = dispatch_trace( 2025-09-07T07:34:42.6982746Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6982858Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.6982900Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.6982936Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6983059Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6983100Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6983137Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6983299Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.6983377Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.6983418Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6983541Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.6983599Z return fn(*args, **kwargs) 2025-09-07T07:34:42.6983633Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6983759Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.6983800Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.6983835Z ^^^^^^^^^ 2025-09-07T07:34:42.6983986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.6984038Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.6984071Z ^^^^^^^^^^^ 2025-09-07T07:34:42.6985068Z File "", line 1, in 2025-09-07T07:34:42.6985212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.6985291Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.6985338Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6985474Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.6985521Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.6985559Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6985765Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.6985811Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.6985847Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6986016Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.6986061Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.6986097Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6986244Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.6986286Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.6986322Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6986456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.6986620Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.6986708Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6986834Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.6986894Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.6987918Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6988044Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.6988084Z leaves = list(leaves) 2025-09-07T07:34:42.6988118Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.6988241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.6988275Z return func(x) 2025-09-07T07:34:42.6988308Z ^^^^^^^ 2025-09-07T07:34:42.6988447Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.6988515Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.6988556Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6988723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.6988763Z return func(*args, **kwargs) 2025-09-07T07:34:42.6988799Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6988982Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.6989093Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.6989095Z 2025-09-07T07:34:42.6989302Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.6989304Z 2025-09-07T07:34:42.6989311Z 2025-09-07T07:34:42.6989386Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.6989583Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.6989585Z 2025-09-07T07:34:42.6989670Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.6989743Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6989780Z inline_call [] 2025-09-07T07:34:42.6989836Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.6990833Z inductor [] 2025-09-07T07:34:42.6990908Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6990979Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6991260Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6991377Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.6991428Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6991579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6991665Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6991799Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6991917Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6991988Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.6992021Z inline_call [] 2025-09-07T07:34:42.6992077Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.6992178Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.6992248Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.6992502Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.6992615Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.6992665Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.6992813Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.6992898Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.6993027Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.6994106Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.6994158Z =================================== FAILURES =================================== 2025-09-07T07:34:42.6994266Z _ WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.6994309Z Traceback (most recent call last): 2025-09-07T07:34:42.6994476Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1489, in test_while_loop_with_sym_expr_cond 2025-09-07T07:34:42.6994511Z self._run_test( 2025-09-07T07:34:42.6994624Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.6994678Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.6994718Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6994851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.6994900Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.6994939Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6995089Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.6995135Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.6995174Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6995312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.6995356Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.6995393Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6995537Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.6995636Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.6995678Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6995829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.6996891Z raise BackendCompilerFailed( 2025-09-07T07:34:42.6997043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.6997099Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6997139Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6997281Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.6997331Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.6997371Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6997487Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.6997598Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.6997644Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6997770Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.6997833Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.6997874Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6998016Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.6998059Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.6998097Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6998235Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.6998276Z return aot_autograd( 2025-09-07T07:34:42.6998313Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.6998450Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.6998518Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.6999529Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.6999691Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.6999797Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.6999841Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7000023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7000067Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7000306Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7000345Z fx_g = _create_graph( 2025-09-07T07:34:42.7000380Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7000543Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7000578Z fx_g = make_fx( 2025-09-07T07:34:42.7000612Z ^^^^^^^^ 2025-09-07T07:34:42.7000762Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7000808Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7000846Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7000991Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7001054Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7001092Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7001251Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7001289Z t = dispatch_trace( 2025-09-07T07:34:42.7001322Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7001435Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7002441Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7002478Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7002604Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7002644Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7002680Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7002842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7002954Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7002996Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7003120Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7003158Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7003193Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7003318Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7003361Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7003396Z ^^^^^^^^^ 2025-09-07T07:34:42.7003528Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7003568Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7003603Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7003756Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7003805Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7003838Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7003995Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7005020Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7005082Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7005258Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7005296Z outs_pair = fn(*args) 2025-09-07T07:34:42.7005331Z ^^^^^^^^^ 2025-09-07T07:34:42.7005504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7005573Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7005618Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7005789Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7005828Z outs_pair = fn(*args) 2025-09-07T07:34:42.7005862Z ^^^^^^^^^ 2025-09-07T07:34:42.7006042Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7006101Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7006144Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7006349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7006420Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7006467Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7006706Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7006744Z outs_pair = fn(*args) 2025-09-07T07:34:42.7006779Z ^^^^^^^^^ 2025-09-07T07:34:42.7006967Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7007013Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7008024Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7008194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7008240Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7008304Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7008447Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7008491Z return handle_torch_function( 2025-09-07T07:34:42.7008527Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7008670Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7008746Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7008792Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7008958Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7008999Z return func(*args, **kwargs) 2025-09-07T07:34:42.7009033Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7009157Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7009203Z result = _engine_run_backward( 2025-09-07T07:34:42.7009238Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7009385Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7009505Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7009575Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7009700Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7009742Z return user_fn(self, *args) 2025-09-07T07:34:42.7009778Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7010890Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7010935Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7010974Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7011132Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7011176Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7011212Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7011335Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7011377Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7011413Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7011579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7011631Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7011690Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7011832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7011882Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7011921Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7012082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7012129Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7012168Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7012327Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7012365Z t = dispatch_trace( 2025-09-07T07:34:42.7012398Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7013470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7013532Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7013588Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7013713Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7013752Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7013787Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7013947Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7014026Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7014067Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7014189Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7014227Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7014261Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7014387Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7014432Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7014467Z ^^^^^^^^^ 2025-09-07T07:34:42.7014618Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7014667Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7014701Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7014765Z File "", line 1, in 2025-09-07T07:34:42.7014907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7014984Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7015029Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7016127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7016180Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7016218Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7016408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7016452Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7016588Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7016763Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7016806Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7016842Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7016986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7017029Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7017095Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7017230Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7017318Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7017363Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7017490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7017552Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7017595Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7017721Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7017759Z leaves = list(leaves) 2025-09-07T07:34:42.7017793Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7018879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7018960Z return func(x) 2025-09-07T07:34:42.7018994Z ^^^^^^^ 2025-09-07T07:34:42.7019132Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7019196Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7019236Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7019405Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7019445Z return func(*args, **kwargs) 2025-09-07T07:34:42.7019480Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7019660Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7019747Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7019752Z 2025-09-07T07:34:42.7019958Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7019960Z 2025-09-07T07:34:42.7019962Z 2025-09-07T07:34:42.7020034Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7020229Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.7020252Z 2025-09-07T07:34:42.7020337Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7020410Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7020445Z inline_call [] 2025-09-07T07:34:42.7020503Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7020537Z inductor [] 2025-09-07T07:34:42.7020613Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7020685Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7020944Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7022018Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7022073Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7022224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7022309Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7022456Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7022579Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7022651Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7022685Z inline_call [] 2025-09-07T07:34:42.7022741Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7022814Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7022883Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7023138Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7023251Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7023300Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7023464Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7023564Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7023693Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7023811Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7023881Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7023916Z inline_call [] 2025-09-07T07:34:42.7023970Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7025002Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7025072Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7025325Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7025439Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7025488Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7025635Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7025738Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7025867Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7025985Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7026200Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-6667f56e9c45294a.xml - 2025-09-07T07:34:42.7026260Z =========================== short test summary info ============================ 2025-09-07T07:34:42.7026681Z FAILED [0.2302s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7026766Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7026770Z 2025-09-07T07:34:42.7026977Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7026979Z 2025-09-07T07:34:42.7026981Z 2025-09-07T07:34:42.7027051Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7027267Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.7027273Z 2025-09-07T07:34:42.7027357Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7027417Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.7027482Z ================== 1 failed, 245 deselected, 2 rerun in 1.17s ================== 2025-09-07T07:34:42.7027517Z Got exit code 1 2025-09-07T07:34:42.7027556Z Retrying single test... 2025-09-07T07:34:42.7029014Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.7029055Z import pkg_resources 2025-09-07T07:34:42.7029227Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-3c90abee91630544.xml 2025-09-07T07:34:42.7029328Z ============================= test session starts ============================== 2025-09-07T07:34:42.7029442Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.7029481Z cachedir: .pytest_cache 2025-09-07T07:34:42.7029638Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.7029685Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.7029724Z configfile: pytest.ini 2025-09-07T07:34:42.7029886Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.7029963Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.7030194Z stepcurrent: skipping 73 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.7030239Z Running 1 items in this shard 2025-09-07T07:34:42.7030241Z 2025-09-07T07:34:42.7030485Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.4338s] [100%] 2025-09-07T07:34:42.7030680Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.2453s] [100%] 2025-09-07T07:34:42.7030867Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True FAILED [0.2275s] [100%] 2025-09-07T07:34:42.7030870Z 2025-09-07T07:34:42.7030920Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.7031032Z _ WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.7031077Z Traceback (most recent call last): 2025-09-07T07:34:42.7031231Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1489, in test_while_loop_with_sym_expr_cond 2025-09-07T07:34:42.7031267Z self._run_test( 2025-09-07T07:34:42.7031379Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7032415Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7032456Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7032594Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7032641Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7032680Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7032831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7032895Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7032937Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7033073Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7033117Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7033155Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7033297Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7033380Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7033419Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7033619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7033665Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7033817Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7033907Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7033948Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7034091Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7034141Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7035149Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7035267Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7035333Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7035376Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7035504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7035568Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7035615Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7035754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7035799Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7035835Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7035974Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7036031Z return aot_autograd( 2025-09-07T07:34:42.7036067Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7036250Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7036320Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7036368Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7036630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7036715Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7036761Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7036944Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7036989Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7037175Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7038243Z fx_g = _create_graph( 2025-09-07T07:34:42.7038279Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7038470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7038506Z fx_g = make_fx( 2025-09-07T07:34:42.7038539Z ^^^^^^^^ 2025-09-07T07:34:42.7038691Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7038736Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7038775Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7038920Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7038965Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7039001Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7039160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7039198Z t = dispatch_trace( 2025-09-07T07:34:42.7039232Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7039345Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7039426Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7039507Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7039632Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7039672Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7039707Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7039869Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7040980Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7041022Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7041149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7041187Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7041227Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7041355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7041396Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7041432Z ^^^^^^^^^ 2025-09-07T07:34:42.7041563Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7041604Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7041663Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7041813Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7041862Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7041896Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7042054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7042119Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7042163Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7042340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7042379Z outs_pair = fn(*args) 2025-09-07T07:34:42.7042414Z ^^^^^^^^^ 2025-09-07T07:34:42.7042584Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7042654Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7043717Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7043893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7043949Z outs_pair = fn(*args) 2025-09-07T07:34:42.7043985Z ^^^^^^^^^ 2025-09-07T07:34:42.7044166Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7044225Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7044268Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7044462Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7044534Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7044580Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7044751Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7044790Z outs_pair = fn(*args) 2025-09-07T07:34:42.7044840Z ^^^^^^^^^ 2025-09-07T07:34:42.7045092Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7045139Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7045175Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7045344Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7045391Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7045429Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7045554Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7045598Z return handle_torch_function( 2025-09-07T07:34:42.7045634Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7046826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7046906Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7046952Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7047119Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7047162Z return func(*args, **kwargs) 2025-09-07T07:34:42.7047222Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7047347Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7047389Z result = _engine_run_backward( 2025-09-07T07:34:42.7047425Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7047571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7047694Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7047745Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7047872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7047912Z return user_fn(self, *args) 2025-09-07T07:34:42.7047949Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7048092Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7048139Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7048175Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7048335Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7048378Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7048432Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7049585Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7049627Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7049663Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7049827Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7049880Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7049921Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7050108Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7050157Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7050196Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7050359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7050451Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7050490Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7050649Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7050687Z t = dispatch_trace( 2025-09-07T07:34:42.7050721Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7050833Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7050877Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7050912Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7051035Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7051074Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7051109Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7051270Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7052412Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7052454Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7052622Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7052661Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7052716Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7052842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7052883Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7052919Z ^^^^^^^^^ 2025-09-07T07:34:42.7053068Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7053118Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7053153Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7053197Z File "", line 1, in 2025-09-07T07:34:42.7053341Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7053420Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7053464Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7053602Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7053648Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7053686Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7053876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7053933Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7053970Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7055246Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7055292Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7055329Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7055472Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7055518Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7055553Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7055687Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7055775Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7055821Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7055985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7056045Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7056088Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7056214Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7056252Z leaves = list(leaves) 2025-09-07T07:34:42.7056287Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7056411Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7056445Z return func(x) 2025-09-07T07:34:42.7056479Z ^^^^^^^ 2025-09-07T07:34:42.7056687Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7056753Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7056797Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7056968Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7057990Z return func(*args, **kwargs) 2025-09-07T07:34:42.7058028Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7058209Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7058323Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7058325Z 2025-09-07T07:34:42.7058533Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7058535Z 2025-09-07T07:34:42.7058537Z 2025-09-07T07:34:42.7058611Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7058810Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.7058814Z 2025-09-07T07:34:42.7058898Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7058972Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7059009Z inline_call [] 2025-09-07T07:34:42.7059065Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7059102Z inductor [] 2025-09-07T07:34:42.7059176Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7059247Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7059521Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7059640Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7059691Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7059845Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7059930Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7060108Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7060229Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7061360Z _ WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.7061405Z Traceback (most recent call last): 2025-09-07T07:34:42.7061601Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1489, in test_while_loop_with_sym_expr_cond 2025-09-07T07:34:42.7061658Z self._run_test( 2025-09-07T07:34:42.7061789Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7061843Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7061884Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7062015Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7062063Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7062101Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7062252Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7062297Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7062337Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7062472Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7062521Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7062558Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7062699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7062781Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7062819Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7062987Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7063032Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7063182Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7064206Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7064250Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7064392Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7064444Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7064482Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7064598Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7064665Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7064710Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7064882Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7064946Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7064987Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7065148Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7065194Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7065232Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7065368Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7065408Z return aot_autograd( 2025-09-07T07:34:42.7065442Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7065581Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7065649Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7065695Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7065857Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7067004Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7067073Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7067260Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7067302Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7067489Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7067529Z fx_g = _create_graph( 2025-09-07T07:34:42.7067564Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7067726Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7067761Z fx_g = make_fx( 2025-09-07T07:34:42.7067793Z ^^^^^^^^ 2025-09-07T07:34:42.7067945Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7067994Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7068032Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7068177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7068219Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7068256Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7068481Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7068520Z t = dispatch_trace( 2025-09-07T07:34:42.7068553Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7068666Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7068707Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7068746Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7069842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7069883Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7069918Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7070126Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7070204Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7070248Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7070371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7070410Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7070444Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7070592Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7070635Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7070670Z ^^^^^^^^^ 2025-09-07T07:34:42.7070803Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7070843Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7070879Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7071028Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7071081Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7071114Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7071269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7071330Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7071427Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7071638Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7072648Z outs_pair = fn(*args) 2025-09-07T07:34:42.7072683Z ^^^^^^^^^ 2025-09-07T07:34:42.7072857Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7072923Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7072968Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7073142Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7073261Z outs_pair = fn(*args) 2025-09-07T07:34:42.7073295Z ^^^^^^^^^ 2025-09-07T07:34:42.7073475Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7073537Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7073580Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7073773Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7073844Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7073908Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7074127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7074166Z outs_pair = fn(*args) 2025-09-07T07:34:42.7074200Z ^^^^^^^^^ 2025-09-07T07:34:42.7074390Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7074439Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7074475Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7074644Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7075652Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7075690Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7075819Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7075861Z return handle_torch_function( 2025-09-07T07:34:42.7075896Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7076037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7076137Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7076184Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7076355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7076396Z return func(*args, **kwargs) 2025-09-07T07:34:42.7076432Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7076619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7076663Z result = _engine_run_backward( 2025-09-07T07:34:42.7076698Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7076844Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7076964Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7077017Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7077234Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7077277Z return user_fn(self, *args) 2025-09-07T07:34:42.7077313Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7077459Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7077502Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7078569Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7078727Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7078771Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7078807Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7078932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7078975Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7079013Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7079178Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7079230Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7079269Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7079411Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7079487Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7079525Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7079689Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7079735Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7079776Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7079938Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7079977Z t = dispatch_trace( 2025-09-07T07:34:42.7080011Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7080124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7080219Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7081230Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7081357Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7081397Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7081431Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7081592Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7081691Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7081734Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7081858Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7081897Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7081931Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7082057Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7082099Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7082134Z ^^^^^^^^^ 2025-09-07T07:34:42.7082282Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7082331Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7082415Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7082457Z File "", line 1, in 2025-09-07T07:34:42.7082619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7082712Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7082759Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7082894Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7082941Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7083994Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7084188Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7084231Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7084268Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7084439Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7084488Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7084524Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7084669Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7084711Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7084747Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7084938Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7085025Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7085071Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7085196Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7085257Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7085303Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7085430Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7085469Z leaves = list(leaves) 2025-09-07T07:34:42.7085502Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7085627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7085663Z return func(x) 2025-09-07T07:34:42.7086753Z ^^^^^^^ 2025-09-07T07:34:42.7086944Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7087010Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7087051Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7087242Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7087286Z return func(*args, **kwargs) 2025-09-07T07:34:42.7087322Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7087505Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7087590Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7087592Z 2025-09-07T07:34:42.7087801Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7087804Z 2025-09-07T07:34:42.7087806Z 2025-09-07T07:34:42.7087878Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7088074Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.7088096Z 2025-09-07T07:34:42.7088200Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7088275Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7088310Z inline_call [] 2025-09-07T07:34:42.7088366Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7088400Z inductor [] 2025-09-07T07:34:42.7088474Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7088546Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7088807Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7088922Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7088975Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7090164Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7090252Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7090382Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7090500Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7090595Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7090630Z inline_call [] 2025-09-07T07:34:42.7090685Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7090758Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7090827Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7091086Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7091243Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7091294Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7091442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7091530Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7091658Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7091777Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7091840Z =================================== FAILURES =================================== 2025-09-07T07:34:42.7091952Z _ WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.7091995Z Traceback (most recent call last): 2025-09-07T07:34:42.7092143Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1489, in test_while_loop_with_sym_expr_cond 2025-09-07T07:34:42.7092178Z self._run_test( 2025-09-07T07:34:42.7093267Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7093325Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7093366Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7093499Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7093545Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7093584Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7093738Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7093815Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7093855Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7093989Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7094033Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7094070Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7094214Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7094295Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7094334Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7094485Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7094534Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7094729Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7094830Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7094871Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7095013Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7095089Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7096101Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7096220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7096286Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7096331Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7096458Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7096599Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7096640Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7096780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7096823Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7096864Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7097051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7097091Z return aot_autograd( 2025-09-07T07:34:42.7097125Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7097262Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7097353Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7097405Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7097566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7097649Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7097694Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7097880Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7097922Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7099134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7099174Z fx_g = _create_graph( 2025-09-07T07:34:42.7099211Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7099418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7099454Z fx_g = make_fx( 2025-09-07T07:34:42.7099486Z ^^^^^^^^ 2025-09-07T07:34:42.7099685Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7099731Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7099769Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7099917Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7099960Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7099997Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7100155Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7100200Z t = dispatch_trace( 2025-09-07T07:34:42.7100234Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7100348Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7100389Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7100425Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7100549Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7100612Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7100648Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7100810Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7101955Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7101997Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7102124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7102167Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7102202Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7102328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7102370Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7102405Z ^^^^^^^^^ 2025-09-07T07:34:42.7102538Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7102583Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7102617Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7102767Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7102817Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7102850Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7103026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7103088Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7103134Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7103309Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7103349Z outs_pair = fn(*args) 2025-09-07T07:34:42.7103432Z ^^^^^^^^^ 2025-09-07T07:34:42.7103604Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7103669Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7104679Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7104854Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7104930Z outs_pair = fn(*args) 2025-09-07T07:34:42.7104964Z ^^^^^^^^^ 2025-09-07T07:34:42.7105143Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7105202Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7105245Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7105442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7105513Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7105561Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7105785Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7105827Z outs_pair = fn(*args) 2025-09-07T07:34:42.7105862Z ^^^^^^^^^ 2025-09-07T07:34:42.7106054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7106100Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7106136Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7106322Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7106367Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7106406Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7106653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7106698Z return handle_torch_function( 2025-09-07T07:34:42.7107769Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7107913Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7107987Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7108031Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7108199Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7108243Z return func(*args, **kwargs) 2025-09-07T07:34:42.7108279Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7108402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7108445Z result = _engine_run_backward( 2025-09-07T07:34:42.7108480Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7108654Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7108780Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7108830Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7109008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7109051Z return user_fn(self, *args) 2025-09-07T07:34:42.7109086Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7109233Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7109276Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7109313Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7109470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7109534Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7109589Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7110682Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7110722Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7110758Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7110922Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7110975Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7111015Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7111152Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7111201Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7111243Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7111408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7111455Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7111494Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7111652Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7111714Z t = dispatch_trace( 2025-09-07T07:34:42.7111748Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7111863Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7111905Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7111941Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7112064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7112104Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7112140Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7113260Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7113339Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7113380Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7113503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7113545Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7113578Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7113705Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7113746Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7113781Z ^^^^^^^^^ 2025-09-07T07:34:42.7113945Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7113998Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7114031Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7114072Z File "", line 1, in 2025-09-07T07:34:42.7114217Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7114294Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7114340Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7114476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7114523Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7114560Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7114790Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7114869Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7114905Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7116046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7116093Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7116129Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7116273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7116314Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7116350Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7116553Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7116643Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7116693Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7116820Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7116879Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7116923Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7117047Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7117113Z leaves = list(leaves) 2025-09-07T07:34:42.7117147Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7117270Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7117305Z return func(x) 2025-09-07T07:34:42.7117337Z ^^^^^^^ 2025-09-07T07:34:42.7117476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7117544Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7117586Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7118743Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7118785Z return func(*args, **kwargs) 2025-09-07T07:34:42.7118820Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7119006Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7119090Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7119094Z 2025-09-07T07:34:42.7119300Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7119332Z 2025-09-07T07:34:42.7119334Z 2025-09-07T07:34:42.7119409Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7119603Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.7119606Z 2025-09-07T07:34:42.7119692Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7119765Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7119803Z inline_call [] 2025-09-07T07:34:42.7119859Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7119893Z inductor [] 2025-09-07T07:34:42.7119967Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7120038Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7120403Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7120585Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7120637Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7120788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7120876Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7121006Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7121128Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7122273Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7122310Z inline_call [] 2025-09-07T07:34:42.7122369Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7122445Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7122515Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7122768Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7122899Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7122949Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7123097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7123181Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7123311Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7123432Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7123502Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7123537Z inline_call [] 2025-09-07T07:34:42.7123591Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7123662Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7123734Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7123985Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7124095Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7124164Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7124312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7125414Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7125545Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7125664Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7125882Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-3c90abee91630544.xml - 2025-09-07T07:34:42.7125987Z =========================== short test summary info ============================ 2025-09-07T07:34:42.7126364Z FAILED [0.2275s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7126461Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7126463Z 2025-09-07T07:34:42.7126772Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7126778Z 2025-09-07T07:34:42.7126780Z 2025-09-07T07:34:42.7126851Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7127044Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.7127047Z 2025-09-07T07:34:42.7127130Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7127192Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.7127260Z ================== 1 failed, 245 deselected, 2 rerun in 1.18s ================== 2025-09-07T07:34:42.7127296Z Got exit code 1 2025-09-07T07:34:42.7127419Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-09-07T07:34:42.7127841Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.7127908Z import pkg_resources 2025-09-07T07:34:42.7128079Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-ce157019e5db02d2.xml 2025-09-07T07:34:42.7128136Z ============================= test session starts ============================== 2025-09-07T07:34:42.7128297Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.7128337Z cachedir: .pytest_cache 2025-09-07T07:34:42.7129581Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.7129626Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.7129666Z configfile: pytest.ini 2025-09-07T07:34:42.7129827Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.7129905Z collecting ... collected 467 items / 74 deselected / 393 selected 2025-09-07T07:34:42.7129956Z stepcurrent: skipping 74 already run items. 2025-09-07T07:34:42.7129997Z Running 172 items in this shard 2025-09-07T07:34:42.7130000Z 2025-09-07T07:34:42.7130197Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_False PASSED [2.6706s] [ 0%] 2025-09-07T07:34:42.7130394Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.7934s] [ 1%] 2025-09-07T07:34:42.7130586Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.7223s] [ 1%] 2025-09-07T07:34:42.7130751Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True FAILED [0.7178s] [ 1%] 2025-09-07T07:34:42.7130755Z 2025-09-07T07:34:42.7130805Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.7130915Z _ WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.7130958Z Traceback (most recent call last): 2025-09-07T07:34:42.7131112Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1489, in test_while_loop_with_sym_expr_cond 2025-09-07T07:34:42.7131166Z self._run_test( 2025-09-07T07:34:42.7131297Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7131355Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7131395Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7131532Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7131580Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7131620Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7132815Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7132864Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7132902Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7133041Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7133089Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7133127Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7133270Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7133354Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7133392Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7133564Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7133610Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7133760Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7133813Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7133858Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7134001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7134052Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7134091Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7134208Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7134274Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7134318Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7134444Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7135473Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7135516Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7135710Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7135758Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7135794Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7135933Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7135972Z return aot_autograd( 2025-09-07T07:34:42.7136007Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7136145Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7136214Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7136259Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7136421Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7136627Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7136695Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7136877Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7136922Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7137107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7137148Z fx_g = _create_graph( 2025-09-07T07:34:42.7137183Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7137346Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7137380Z fx_g = make_fx( 2025-09-07T07:34:42.7137413Z ^^^^^^^^ 2025-09-07T07:34:42.7138652Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7138703Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7138740Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7138887Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7138929Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7138966Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7139149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7139233Z t = dispatch_trace( 2025-09-07T07:34:42.7139267Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7139380Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7139421Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7139459Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7139586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7139626Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7139662Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7139824Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7139904Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7139947Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7140071Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7140155Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7140191Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7140334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7141350Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7141387Z ^^^^^^^^^ 2025-09-07T07:34:42.7141521Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7141561Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7141597Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7141749Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7141802Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7141835Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7141992Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7142054Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7142133Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7142344Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7142385Z outs_pair = fn(*args) 2025-09-07T07:34:42.7142419Z ^^^^^^^^^ 2025-09-07T07:34:42.7142590Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7142657Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7142703Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7142876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7142914Z outs_pair = fn(*args) 2025-09-07T07:34:42.7142950Z ^^^^^^^^^ 2025-09-07T07:34:42.7143126Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7144207Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7144251Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7144445Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7144514Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7144581Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7144753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7144792Z outs_pair = fn(*args) 2025-09-07T07:34:42.7144826Z ^^^^^^^^^ 2025-09-07T07:34:42.7145018Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7145065Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7145102Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7145269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7145316Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7145352Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7145482Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7145570Z return handle_torch_function( 2025-09-07T07:34:42.7145607Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7145748Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7145837Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7145883Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7146052Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7146092Z return func(*args, **kwargs) 2025-09-07T07:34:42.7148196Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7148330Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7148375Z result = _engine_run_backward( 2025-09-07T07:34:42.7148410Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7148595Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7148718Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7148767Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7148935Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7148977Z return user_fn(self, *args) 2025-09-07T07:34:42.7149015Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7149169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7149212Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7149251Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7149410Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7149455Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7149491Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7149614Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7149852Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7149890Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7151426Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7151483Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7151525Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7151665Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7151753Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7151791Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7151955Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7152004Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7152043Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7152205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7152244Z t = dispatch_trace( 2025-09-07T07:34:42.7152278Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7152392Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7152436Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7152475Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7152603Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7152642Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7152678Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7152895Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7152976Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7153018Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7153143Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7153181Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7153217Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7153390Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7153438Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7153473Z ^^^^^^^^^ 2025-09-07T07:34:42.7153622Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7153674Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7154813Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7154857Z File "", line 1, in 2025-09-07T07:34:42.7155021Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7155149Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7155196Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7155334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7155382Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7155422Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7155615Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7155658Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7155694Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7155867Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7155915Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7155951Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7156095Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7156188Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7156225Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7156383Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7156473Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7156595Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7156721Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7156782Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7157873Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7158000Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7158039Z leaves = list(leaves) 2025-09-07T07:34:42.7158073Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7158244Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7158281Z return func(x) 2025-09-07T07:34:42.7158313Z ^^^^^^^ 2025-09-07T07:34:42.7158452Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7158516Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7158559Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7158725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7158770Z return func(*args, **kwargs) 2025-09-07T07:34:42.7158805Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7158986Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7159102Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7159106Z 2025-09-07T07:34:42.7159318Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7159321Z 2025-09-07T07:34:42.7159323Z 2025-09-07T07:34:42.7159396Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7159590Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7159612Z 2025-09-07T07:34:42.7159715Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7159793Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7159828Z inline_call [] 2025-09-07T07:34:42.7160933Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7161010Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7161084Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7161342Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7161457Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7161508Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7161663Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7161786Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7161920Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7162040Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7162174Z _ WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.7162217Z Traceback (most recent call last): 2025-09-07T07:34:42.7162369Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1489, in test_while_loop_with_sym_expr_cond 2025-09-07T07:34:42.7162404Z self._run_test( 2025-09-07T07:34:42.7162518Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7162573Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7162615Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7162791Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7162838Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7162877Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7163028Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7163074Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7164130Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7164268Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7164312Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7164349Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7164496Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7164576Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7164615Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7164784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7164832Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7164983Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7165037Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7165077Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7165264Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7165329Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7165381Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7165499Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7165566Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7165612Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7165738Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7165802Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7165843Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7167103Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7167149Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7167187Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7167327Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7167367Z return aot_autograd( 2025-09-07T07:34:42.7167402Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7167539Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7167607Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7167680Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7167842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7167971Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7168016Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7168203Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7168288Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7168477Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7168516Z fx_g = _create_graph( 2025-09-07T07:34:42.7168552Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7168770Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7168804Z fx_g = make_fx( 2025-09-07T07:34:42.7168837Z ^^^^^^^^ 2025-09-07T07:34:42.7168991Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7169036Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7170051Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7170201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7170244Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7170281Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7170461Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7170502Z t = dispatch_trace( 2025-09-07T07:34:42.7170535Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7170648Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7170689Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7170725Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7170853Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7170893Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7170946Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7171140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7171220Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7171262Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7171386Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7171426Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7171460Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7171586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7171627Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7172627Z ^^^^^^^^^ 2025-09-07T07:34:42.7172759Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7172803Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7172839Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7172988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7173090Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7173125Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7173280Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7173360Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7173405Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7173625Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7173665Z outs_pair = fn(*args) 2025-09-07T07:34:42.7173701Z ^^^^^^^^^ 2025-09-07T07:34:42.7173926Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7173994Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7174040Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7174212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7174252Z outs_pair = fn(*args) 2025-09-07T07:34:42.7174285Z ^^^^^^^^^ 2025-09-07T07:34:42.7174462Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7174521Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7174564Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7175816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7175888Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7175951Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7176127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7176168Z outs_pair = fn(*args) 2025-09-07T07:34:42.7176202Z ^^^^^^^^^ 2025-09-07T07:34:42.7176393Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7176438Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7176474Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7176768Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7176815Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7176852Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7177026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7177069Z return handle_torch_function( 2025-09-07T07:34:42.7177107Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7177249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7177323Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7177369Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7177537Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7177579Z return func(*args, **kwargs) 2025-09-07T07:34:42.7177614Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7178717Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7178761Z result = _engine_run_backward( 2025-09-07T07:34:42.7178902Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7179051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7179196Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7179246Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7179372Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7179414Z return user_fn(self, *args) 2025-09-07T07:34:42.7179452Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7179598Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7179641Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7179678Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7179836Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7179884Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7179919Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7180043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7180083Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7180119Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7180284Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7180338Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7180377Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7180513Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7181553Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7181594Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7181759Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7181807Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7181845Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7182005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7182043Z t = dispatch_trace( 2025-09-07T07:34:42.7182096Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7182262Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7182305Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7182342Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7182468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7182507Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7182543Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7182704Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7182782Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7182824Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7182947Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7182988Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7183022Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7183147Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7183188Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7184190Z ^^^^^^^^^ 2025-09-07T07:34:42.7184341Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7184409Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7184442Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7184484Z File "", line 1, in 2025-09-07T07:34:42.7184627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7184705Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7184752Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7184889Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7184936Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7184975Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7185167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7185213Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7185249Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7185422Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7185466Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7185503Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7185699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7185742Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7185778Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7185925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7187052Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7187101Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7187227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7187287Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7187330Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7187455Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7187535Z leaves = list(leaves) 2025-09-07T07:34:42.7187570Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7187746Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7187780Z return func(x) 2025-09-07T07:34:42.7187814Z ^^^^^^^ 2025-09-07T07:34:42.7187952Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7188019Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7188060Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7188227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7188267Z return func(*args, **kwargs) 2025-09-07T07:34:42.7188303Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7188486Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7188571Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7188575Z 2025-09-07T07:34:42.7188825Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7188828Z 2025-09-07T07:34:42.7188849Z 2025-09-07T07:34:42.7188923Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7189115Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7189117Z 2025-09-07T07:34:42.7190229Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7190305Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7190342Z inline_call [] 2025-09-07T07:34:42.7190399Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7190474Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7190545Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7190804Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7190921Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7190973Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7191123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7191208Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7191341Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7191461Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7191600Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7191636Z inline_call [] 2025-09-07T07:34:42.7191690Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7191763Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7191836Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7192092Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7192206Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7192285Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7193451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7193539Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7193668Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7193788Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7193839Z =================================== FAILURES =================================== 2025-09-07T07:34:42.7193946Z _ WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.7193990Z Traceback (most recent call last): 2025-09-07T07:34:42.7194138Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1489, in test_while_loop_with_sym_expr_cond 2025-09-07T07:34:42.7194176Z self._run_test( 2025-09-07T07:34:42.7194288Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7194343Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7194385Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7194518Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7194582Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7194621Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7194771Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7194819Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7194857Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7194997Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7195043Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7195080Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7195259Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7196300Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7196341Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7196568Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7196614Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7196764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7196817Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7196861Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7197003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7197055Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7197115Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7197232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7197344Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7197388Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7197515Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7197577Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7197620Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7197806Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7197851Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7197888Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7198027Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7198066Z return aot_autograd( 2025-09-07T07:34:42.7199126Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7199264Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7199332Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7199378Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7199542Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7199627Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7199672Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7199857Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7199900Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7200084Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7200199Z fx_g = _create_graph( 2025-09-07T07:34:42.7200234Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7200398Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7200432Z fx_g = make_fx( 2025-09-07T07:34:42.7200511Z ^^^^^^^^ 2025-09-07T07:34:42.7200667Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7200713Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7200750Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7200898Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7200940Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7200979Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7201137Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7202144Z t = dispatch_trace( 2025-09-07T07:34:42.7202180Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7202331Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7202373Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7202411Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7202537Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7202577Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7202613Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7202793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7202876Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7202915Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7203088Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7203127Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7203162Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7203287Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7203375Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7203410Z ^^^^^^^^^ 2025-09-07T07:34:42.7203543Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7203584Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7203620Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7203768Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7204780Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7204815Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7204972Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7205034Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7205078Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7205304Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7205344Z outs_pair = fn(*args) 2025-09-07T07:34:42.7205378Z ^^^^^^^^^ 2025-09-07T07:34:42.7205556Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7205643Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7205687Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7205906Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7205944Z outs_pair = fn(*args) 2025-09-07T07:34:42.7205979Z ^^^^^^^^^ 2025-09-07T07:34:42.7206156Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7206218Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7206260Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7206456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7206597Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7206645Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7206816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7206856Z outs_pair = fn(*args) 2025-09-07T07:34:42.7207911Z ^^^^^^^^^ 2025-09-07T07:34:42.7208103Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7208148Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7208185Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7208379Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7208426Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7208465Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7208591Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7208632Z return handle_torch_function( 2025-09-07T07:34:42.7208669Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7208846Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7208921Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7209000Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7209168Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7209208Z return func(*args, **kwargs) 2025-09-07T07:34:42.7209245Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7209370Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7209412Z result = _engine_run_backward( 2025-09-07T07:34:42.7209449Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7209593Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7209713Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7210749Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7210878Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7210920Z return user_fn(self, *args) 2025-09-07T07:34:42.7210956Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7211102Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7211168Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7211204Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7211362Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7211406Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7211443Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7211569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7211612Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7211647Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7211814Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7211867Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7211908Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7212047Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7212097Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7212135Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7212296Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7212342Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7212383Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7213500Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7213540Z t = dispatch_trace( 2025-09-07T07:34:42.7213574Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7213702Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7213745Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7213785Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7213908Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7213947Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7213982Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7214142Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7214233Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7214287Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7214414Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7214451Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7214487Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7214613Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7214657Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7214691Z ^^^^^^^^^ 2025-09-07T07:34:42.7214841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7214889Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7214923Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7214964Z File "", line 1, in 2025-09-07T07:34:42.7216063Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7216140Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7216186Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7216323Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7216388Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7216426Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7216688Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7216731Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7216767Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7216938Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7216984Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7217019Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7217162Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7217206Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7217241Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7217378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7217466Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7217512Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7217638Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7217701Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7217744Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7218832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7218871Z leaves = list(leaves) 2025-09-07T07:34:42.7218937Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7219062Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7219099Z return func(x) 2025-09-07T07:34:42.7219132Z ^^^^^^^ 2025-09-07T07:34:42.7219271Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7219336Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7219379Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7219562Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7219621Z return func(*args, **kwargs) 2025-09-07T07:34:42.7219657Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7219839Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7219923Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7219926Z 2025-09-07T07:34:42.7220134Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7220136Z 2025-09-07T07:34:42.7220138Z 2025-09-07T07:34:42.7220210Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7220404Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7220408Z 2025-09-07T07:34:42.7220495Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7220571Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7220605Z inline_call [] 2025-09-07T07:34:42.7220664Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7220739Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7221792Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7222052Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7222166Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7222217Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7222372Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7222457Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7222590Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7222709Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7222784Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7222817Z inline_call [] 2025-09-07T07:34:42.7222873Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7222946Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7223015Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7223272Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7223385Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7223450Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7223601Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7223687Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7223817Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7223935Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7224006Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7225022Z inline_call [] 2025-09-07T07:34:42.7225095Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7225168Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7225237Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7225489Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7225602Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7225651Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7225799Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7225882Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7226012Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7226130Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7226347Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-ce157019e5db02d2.xml - 2025-09-07T07:34:42.7226406Z =========================== short test summary info ============================ 2025-09-07T07:34:42.7226841Z FAILED [0.7178s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7226925Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7226927Z 2025-09-07T07:34:42.7227135Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7227137Z 2025-09-07T07:34:42.7227139Z 2025-09-07T07:34:42.7227210Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7227402Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7227406Z 2025-09-07T07:34:42.7227490Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7227550Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.7227620Z ============= 1 failed, 1 passed, 74 deselected, 2 rerun in 5.13s ============== 2025-09-07T07:34:42.7228624Z Got exit code 1 2025-09-07T07:34:42.7228664Z Retrying single test... 2025-09-07T07:34:42.7229090Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.7229129Z import pkg_resources 2025-09-07T07:34:42.7229322Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-ace31a47468ad047.xml 2025-09-07T07:34:42.7229382Z ============================= test session starts ============================== 2025-09-07T07:34:42.7229496Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.7229535Z cachedir: .pytest_cache 2025-09-07T07:34:42.7229693Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.7229737Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.7229794Z configfile: pytest.ini 2025-09-07T07:34:42.7229973Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.7230051Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.7230281Z stepcurrent: skipping 75 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7230324Z Running 1 items in this shard 2025-09-07T07:34:42.7230326Z 2025-09-07T07:34:42.7230521Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.9184s] [100%] 2025-09-07T07:34:42.7230713Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.7141s] [100%] 2025-09-07T07:34:42.7230882Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True FAILED [0.7028s] [100%] 2025-09-07T07:34:42.7230885Z 2025-09-07T07:34:42.7230935Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.7231044Z _ WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.7231088Z Traceback (most recent call last): 2025-09-07T07:34:42.7231265Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1489, in test_while_loop_with_sym_expr_cond 2025-09-07T07:34:42.7232259Z self._run_test( 2025-09-07T07:34:42.7232374Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7232430Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7232470Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7232606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7232655Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7232695Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7232847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7232894Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7232932Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7233072Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7233115Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7233151Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7233295Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7233375Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7233417Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7233569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7233616Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7233782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7233838Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7233878Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7234979Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7235032Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7235071Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7235185Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7235286Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7235329Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7235457Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7235519Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7235562Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7235701Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7235745Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7235781Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7235918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7235958Z return aot_autograd( 2025-09-07T07:34:42.7235994Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7236129Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7236199Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7236247Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7236412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7236589Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7236634Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7236818Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7237828Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7238017Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7238057Z fx_g = _create_graph( 2025-09-07T07:34:42.7238091Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7238255Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7238290Z fx_g = make_fx( 2025-09-07T07:34:42.7238325Z ^^^^^^^^ 2025-09-07T07:34:42.7238476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7238521Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7238560Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7238706Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7238750Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7238788Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7238947Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7238985Z t = dispatch_trace( 2025-09-07T07:34:42.7239045Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7239158Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7239201Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7239236Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7239361Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7239401Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7240441Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7240605Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7240724Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7240766Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7240890Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7240929Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7240965Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7241090Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7241133Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7241168Z ^^^^^^^^^ 2025-09-07T07:34:42.7241300Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7241341Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7241376Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7241527Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7241576Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7241610Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7241768Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7241831Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7241896Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7242071Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7242110Z outs_pair = fn(*args) 2025-09-07T07:34:42.7242145Z ^^^^^^^^^ 2025-09-07T07:34:42.7243278Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7243349Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7243392Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7243570Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7243607Z outs_pair = fn(*args) 2025-09-07T07:34:42.7243642Z ^^^^^^^^^ 2025-09-07T07:34:42.7243823Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7243883Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7243926Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7244121Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7244193Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7244238Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7244409Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7244465Z outs_pair = fn(*args) 2025-09-07T07:34:42.7244499Z ^^^^^^^^^ 2025-09-07T07:34:42.7244690Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7244735Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7244771Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7244940Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7244985Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7245037Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7245180Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7246177Z return handle_torch_function( 2025-09-07T07:34:42.7246213Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7246356Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7246431Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7246476Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7246704Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7246746Z return func(*args, **kwargs) 2025-09-07T07:34:42.7246781Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7246906Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7246949Z result = _engine_run_backward( 2025-09-07T07:34:42.7246985Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7247131Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7247253Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7247328Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7247455Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7247496Z return user_fn(self, *args) 2025-09-07T07:34:42.7247533Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7247677Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7247723Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7247758Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7247915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7248926Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7248964Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7249087Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7249130Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7249165Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7249330Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7249381Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7249421Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7249561Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7249611Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7249650Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7249831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7249880Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7249919Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7250078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7250116Z t = dispatch_trace( 2025-09-07T07:34:42.7250150Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7250263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7250324Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7250380Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7250505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7250544Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7251548Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7251709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7251790Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7251830Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7251954Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7251992Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7252027Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7252153Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7252195Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7252229Z ^^^^^^^^^ 2025-09-07T07:34:42.7252378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7252428Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7252461Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7252523Z File "", line 1, in 2025-09-07T07:34:42.7252668Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7252745Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7252790Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7252926Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7252974Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7253013Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7253202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7254201Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7254237Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7254412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7254456Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7254493Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7254636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7254678Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7254715Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7254850Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7254937Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7254998Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7255123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7255186Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7255229Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7255356Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7255394Z leaves = list(leaves) 2025-09-07T07:34:42.7255428Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7255567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7255622Z return func(x) 2025-09-07T07:34:42.7255655Z ^^^^^^^ 2025-09-07T07:34:42.7255792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7255858Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7256936Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7257107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7257148Z return func(*args, **kwargs) 2025-09-07T07:34:42.7257184Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7257366Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7257451Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7257455Z 2025-09-07T07:34:42.7257662Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7257664Z 2025-09-07T07:34:42.7257668Z 2025-09-07T07:34:42.7257741Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7257933Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7257966Z 2025-09-07T07:34:42.7258051Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7258126Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7258161Z inline_call [] 2025-09-07T07:34:42.7258217Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7258252Z inductor [] 2025-09-07T07:34:42.7258327Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7258399Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7258660Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7258776Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7258828Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7258978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7259064Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7260160Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7260283Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7260391Z _ WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.7260434Z Traceback (most recent call last): 2025-09-07T07:34:42.7260604Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1489, in test_while_loop_with_sym_expr_cond 2025-09-07T07:34:42.7260639Z self._run_test( 2025-09-07T07:34:42.7260755Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7260809Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7260849Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7260980Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7261026Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7261081Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7261248Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7261294Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7261335Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7261470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7261515Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7261553Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7261695Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7261775Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7261813Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7261966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7262966Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7263118Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7263173Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7263213Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7263373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7263425Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7263463Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7263579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7263644Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7263691Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7263818Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7263881Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7263924Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7264064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7264109Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7264146Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7264282Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7264322Z return aot_autograd( 2025-09-07T07:34:42.7264356Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7264491Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7264561Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7265558Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7265738Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7265822Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7265869Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7266053Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7266095Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7266281Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7266334Z fx_g = _create_graph( 2025-09-07T07:34:42.7266383Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7266615Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7266651Z fx_g = make_fx( 2025-09-07T07:34:42.7266684Z ^^^^^^^^ 2025-09-07T07:34:42.7266836Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7266882Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7266919Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7267066Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7267108Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7267145Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7267304Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7267344Z t = dispatch_trace( 2025-09-07T07:34:42.7267377Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7267490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7268500Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7268538Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7268662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7268730Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7268765Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7268929Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7269007Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7269051Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7269175Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7269214Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7269248Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7269374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7269415Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7269452Z ^^^^^^^^^ 2025-09-07T07:34:42.7269584Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7269623Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7269660Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7269808Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7269858Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7269892Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7270050Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7270113Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7271137Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7271314Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7271356Z outs_pair = fn(*args) 2025-09-07T07:34:42.7271391Z ^^^^^^^^^ 2025-09-07T07:34:42.7271564Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7271630Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7271676Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7271884Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7271924Z outs_pair = fn(*args) 2025-09-07T07:34:42.7271958Z ^^^^^^^^^ 2025-09-07T07:34:42.7272136Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7272196Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7272239Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7272432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7272502Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7272548Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7272722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7272761Z outs_pair = fn(*args) 2025-09-07T07:34:42.7272795Z ^^^^^^^^^ 2025-09-07T07:34:42.7272986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7273049Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7274040Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7274211Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7274257Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7274294Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7274419Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7274465Z return handle_torch_function( 2025-09-07T07:34:42.7274501Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7274642Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7274719Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7274764Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7274935Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7274976Z return func(*args, **kwargs) 2025-09-07T07:34:42.7275012Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7275134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7275176Z result = _engine_run_backward( 2025-09-07T07:34:42.7275213Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7275361Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7275480Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7275553Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7275678Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7275721Z return user_fn(self, *args) 2025-09-07T07:34:42.7275757Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7276916Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7276960Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7276997Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7277205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7277250Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7277286Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7277410Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7277449Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7277487Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7277652Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7277703Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7277744Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7277884Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7277934Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7277973Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7278135Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7278182Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7278222Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7278380Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7278440Z t = dispatch_trace( 2025-09-07T07:34:42.7278473Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7279547Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7279590Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7279627Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7279750Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7279792Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7279827Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7279987Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7280065Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7280106Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7280293Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7280332Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7280366Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7280493Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7280534Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7280570Z ^^^^^^^^^ 2025-09-07T07:34:42.7280721Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7280770Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7280803Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7280869Z File "", line 1, in 2025-09-07T07:34:42.7281014Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7281094Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7281139Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7282235Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7282286Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7282323Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7282548Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7282592Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7282628Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7282800Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7282846Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7282882Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7283026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7283068Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7283104Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7283237Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7283326Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7283372Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7283498Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7283557Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7283601Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7283747Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7283786Z leaves = list(leaves) 2025-09-07T07:34:42.7283820Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7284890Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7284926Z return func(x) 2025-09-07T07:34:42.7284961Z ^^^^^^^ 2025-09-07T07:34:42.7285099Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7285164Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7285206Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7285373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7285416Z return func(*args, **kwargs) 2025-09-07T07:34:42.7285451Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7285632Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7285716Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7285720Z 2025-09-07T07:34:42.7285926Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7285930Z 2025-09-07T07:34:42.7285931Z 2025-09-07T07:34:42.7286004Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7286211Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7286214Z 2025-09-07T07:34:42.7286300Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7286374Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7286411Z inline_call [] 2025-09-07T07:34:42.7286466Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7286563Z inductor [] 2025-09-07T07:34:42.7286637Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7286709Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7287008Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7288092Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7288145Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7288296Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7288382Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7288514Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7288632Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7288703Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7288739Z inline_call [] 2025-09-07T07:34:42.7288794Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7288867Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7288938Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7289192Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7291989Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7292039Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7292187Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7292272Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7292405Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7292522Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7292573Z =================================== FAILURES =================================== 2025-09-07T07:34:42.7292682Z _ WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.7292726Z Traceback (most recent call last): 2025-09-07T07:34:42.7293870Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1489, in test_while_loop_with_sym_expr_cond 2025-09-07T07:34:42.7293906Z self._run_test( 2025-09-07T07:34:42.7294019Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7294073Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7294116Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7294249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7294296Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7294334Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7294512Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7294561Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7294600Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7294735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7294779Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7294816Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7294958Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7295068Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7295108Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7295259Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7295305Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7295456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7295511Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7295552Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7296721Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7296774Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7296814Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7296932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7296998Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7297043Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7297167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7297259Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7297300Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7297439Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7297483Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7297521Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7297658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7297700Z return aot_autograd( 2025-09-07T07:34:42.7297734Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7297871Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7297941Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7297986Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7298147Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7298229Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7298274Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7299422Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7299469Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7299656Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7299695Z fx_g = _create_graph( 2025-09-07T07:34:42.7299758Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7299922Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7299959Z fx_g = make_fx( 2025-09-07T07:34:42.7299991Z ^^^^^^^^ 2025-09-07T07:34:42.7300144Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7300189Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7300226Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7300389Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7300450Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7300488Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7300647Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7300685Z t = dispatch_trace( 2025-09-07T07:34:42.7300719Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7300833Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7300874Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7300910Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7301033Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7301073Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7302065Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7302231Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7302308Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7302349Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7302474Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7302531Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7302565Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7302691Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7302732Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7302766Z ^^^^^^^^^ 2025-09-07T07:34:42.7302898Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7302939Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7302975Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7303123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7303172Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7303208Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7303365Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7303429Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7303474Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7303647Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7303688Z outs_pair = fn(*args) 2025-09-07T07:34:42.7303722Z ^^^^^^^^^ 2025-09-07T07:34:42.7304851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7304918Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7304963Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7305150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7305193Z outs_pair = fn(*args) 2025-09-07T07:34:42.7305226Z ^^^^^^^^^ 2025-09-07T07:34:42.7305404Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7305463Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7305505Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7305714Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7305796Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7305841Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7306014Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7306053Z outs_pair = fn(*args) 2025-09-07T07:34:42.7306089Z ^^^^^^^^^ 2025-09-07T07:34:42.7306278Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7306324Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7306360Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7306694Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7306742Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7306779Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7307894Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7307940Z return handle_torch_function( 2025-09-07T07:34:42.7307976Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7308148Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7308222Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7308267Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7308435Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7308474Z return func(*args, **kwargs) 2025-09-07T07:34:42.7308511Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7308636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7308677Z result = _engine_run_backward( 2025-09-07T07:34:42.7308712Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7308860Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7308983Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7309034Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7309159Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7309201Z return user_fn(self, *args) 2025-09-07T07:34:42.7309237Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7309384Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7309426Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7309463Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7309638Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7310644Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7310683Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7310807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7310848Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7310883Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7311047Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7311121Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7311177Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7311315Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7311364Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7311404Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7311566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7311616Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7311654Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7311812Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7311850Z t = dispatch_trace( 2025-09-07T07:34:42.7311884Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7312000Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7312042Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7312079Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7312203Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7313195Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7313230Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7313409Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7313487Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7313529Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7313653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7313692Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7313728Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7313855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7313896Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7313931Z ^^^^^^^^^ 2025-09-07T07:34:42.7314080Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7314133Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7314166Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7314208Z File "", line 1, in 2025-09-07T07:34:42.7314351Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7314428Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7314474Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7314612Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7314659Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7314696Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7314902Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7315896Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7315934Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7316106Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7316150Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7316186Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7316328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7316398Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7316435Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7316626Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7316715Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7316760Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7316888Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7316947Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7316990Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7317116Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7317155Z leaves = list(leaves) 2025-09-07T07:34:42.7317190Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7317315Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7317350Z return func(x) 2025-09-07T07:34:42.7317383Z ^^^^^^^ 2025-09-07T07:34:42.7317521Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7318544Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7318620Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7318788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7318828Z return func(*args, **kwargs) 2025-09-07T07:34:42.7318864Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7319045Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7319131Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7319135Z 2025-09-07T07:34:42.7319341Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7319344Z 2025-09-07T07:34:42.7319346Z 2025-09-07T07:34:42.7319419Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7319613Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7319616Z 2025-09-07T07:34:42.7319701Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7319774Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7319810Z inline_call [] 2025-09-07T07:34:42.7319867Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7319902Z inductor [] 2025-09-07T07:34:42.7319976Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7320046Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7320373Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7320489Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7320541Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7320692Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7320778Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7321912Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7322033Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7322108Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7322142Z inline_call [] 2025-09-07T07:34:42.7322198Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7322273Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7322342Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7322595Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7322707Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7322759Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7322908Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7322992Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7323122Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7323257Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7323326Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7323361Z inline_call [] 2025-09-07T07:34:42.7323414Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7323486Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7323554Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7323808Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7323918Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7325144Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7325295Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7325382Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7325510Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7325627Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7325845Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-ace31a47468ad047.xml - 2025-09-07T07:34:42.7325902Z =========================== short test summary info ============================ 2025-09-07T07:34:42.7326274Z FAILED [0.7028s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7326362Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7326364Z 2025-09-07T07:34:42.7326637Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7326641Z 2025-09-07T07:34:42.7326643Z 2025-09-07T07:34:42.7326714Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7326946Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7326948Z 2025-09-07T07:34:42.7327032Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7327092Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.7327159Z ================== 1 failed, 245 deselected, 2 rerun in 2.53s ================== 2025-09-07T07:34:42.7327194Z Got exit code 1 2025-09-07T07:34:42.7327234Z Retrying single test... 2025-09-07T07:34:42.7327657Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.7327698Z import pkg_resources 2025-09-07T07:34:42.7327869Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-0080867fd524e935.xml 2025-09-07T07:34:42.7327925Z ============================= test session starts ============================== 2025-09-07T07:34:42.7329014Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.7329055Z cachedir: .pytest_cache 2025-09-07T07:34:42.7329245Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.7329289Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.7329327Z configfile: pytest.ini 2025-09-07T07:34:42.7329489Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.7329565Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.7329796Z stepcurrent: skipping 75 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7329837Z Running 1 items in this shard 2025-09-07T07:34:42.7329841Z 2025-09-07T07:34:42.7330034Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.9427s] [100%] 2025-09-07T07:34:42.7330227Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.7131s] [100%] 2025-09-07T07:34:42.7330393Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True FAILED [0.7303s] [100%] 2025-09-07T07:34:42.7330396Z 2025-09-07T07:34:42.7330445Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.7330554Z _ WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.7330598Z Traceback (most recent call last): 2025-09-07T07:34:42.7330750Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1489, in test_while_loop_with_sym_expr_cond 2025-09-07T07:34:42.7330800Z self._run_test( 2025-09-07T07:34:42.7330916Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7330972Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7331014Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7331148Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7331195Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7332192Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7332363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7332423Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7332462Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7332598Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7332642Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7332679Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7332825Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7332905Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7332944Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7333095Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7333143Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7333292Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7333346Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7333385Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7333528Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7333597Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7333636Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7333752Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7333820Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7333864Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7334949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7335015Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7335057Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7335197Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7335241Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7335282Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7335420Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7335459Z return aot_autograd( 2025-09-07T07:34:42.7335493Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7335630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7335699Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7335747Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7335906Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7336005Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7336050Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7336236Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7336279Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7336466Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7336567Z fx_g = _create_graph( 2025-09-07T07:34:42.7336604Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7336812Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7336848Z fx_g = make_fx( 2025-09-07T07:34:42.7337847Z ^^^^^^^^ 2025-09-07T07:34:42.7338004Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7338050Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7338089Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7338235Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7338278Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7338314Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7338472Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7338511Z t = dispatch_trace( 2025-09-07T07:34:42.7338546Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7338660Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7338701Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7338736Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7338863Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7338930Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7338966Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7339128Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7339206Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7339248Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7339371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7339413Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7339447Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7340532Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7340575Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7340611Z ^^^^^^^^^ 2025-09-07T07:34:42.7340743Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7340786Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7340820Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7340970Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7341019Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7341053Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7341212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7341275Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7341318Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7341515Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7341555Z outs_pair = fn(*args) 2025-09-07T07:34:42.7341591Z ^^^^^^^^^ 2025-09-07T07:34:42.7341763Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7341829Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7341874Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7342060Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7342114Z outs_pair = fn(*args) 2025-09-07T07:34:42.7342148Z ^^^^^^^^^ 2025-09-07T07:34:42.7343282Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7343343Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7343385Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7343583Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7343653Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7343699Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7343873Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7343913Z outs_pair = fn(*args) 2025-09-07T07:34:42.7343948Z ^^^^^^^^^ 2025-09-07T07:34:42.7344139Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7344185Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7344221Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7344409Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7344454Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7344492Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7344617Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7344659Z return handle_torch_function( 2025-09-07T07:34:42.7344697Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7344839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7344912Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7344958Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7345125Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7346123Z return func(*args, **kwargs) 2025-09-07T07:34:42.7346160Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7346284Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7346325Z result = _engine_run_backward( 2025-09-07T07:34:42.7346361Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7346581Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7346705Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7346754Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7346905Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7346949Z return user_fn(self, *args) 2025-09-07T07:34:42.7346986Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7347131Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7347174Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7347212Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7347369Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7347453Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7347490Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7347613Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7347653Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7347689Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7347853Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7348872Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7348913Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7349050Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7349099Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7349138Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7349301Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7349349Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7349387Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7349548Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7349610Z t = dispatch_trace( 2025-09-07T07:34:42.7349645Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7349758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7349801Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7349836Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7349961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7350001Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7350036Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7350198Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7350276Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7350319Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7350441Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7350481Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7351530Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7351657Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7351698Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7351733Z ^^^^^^^^^ 2025-09-07T07:34:42.7351934Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7351986Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7352020Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7352062Z File "", line 1, in 2025-09-07T07:34:42.7352221Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7352300Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7352347Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7352483Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7352529Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7352568Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7352761Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7352832Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7352868Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7353041Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7353085Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7353122Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7353269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7354327Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7354364Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7354499Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7354587Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7354635Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7354760Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7354818Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7354863Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7354988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7355047Z leaves = list(leaves) 2025-09-07T07:34:42.7355081Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7355204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7355238Z return func(x) 2025-09-07T07:34:42.7355271Z ^^^^^^^ 2025-09-07T07:34:42.7355408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7355475Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7355515Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7355683Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7355725Z return func(*args, **kwargs) 2025-09-07T07:34:42.7355761Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7355989Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7356075Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7356077Z 2025-09-07T07:34:42.7357413Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7357417Z 2025-09-07T07:34:42.7357419Z 2025-09-07T07:34:42.7357495Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7357688Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7357732Z 2025-09-07T07:34:42.7357844Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7357919Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7357957Z inline_call [] 2025-09-07T07:34:42.7358013Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7358048Z inductor [] 2025-09-07T07:34:42.7358121Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7358192Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7358474Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7358606Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7358657Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7358862Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7358949Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7359080Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7359199Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7359307Z _ WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.7359350Z Traceback (most recent call last): 2025-09-07T07:34:42.7359501Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1489, in test_while_loop_with_sym_expr_cond 2025-09-07T07:34:42.7359536Z self._run_test( 2025-09-07T07:34:42.7360688Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7360746Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7360786Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7360944Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7360991Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7361029Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7361179Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7361224Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7361266Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7361403Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7361446Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7361484Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7361627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7361712Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7361749Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7361901Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7361948Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7362098Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7362153Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7362193Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7362334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7362397Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7363404Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7363525Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7363591Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7363635Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7363760Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7363824Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7363895Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7364035Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7364078Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7364117Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7364255Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7364295Z return aot_autograd( 2025-09-07T07:34:42.7364330Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7364466Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7364534Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7364580Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7364794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7364877Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7364924Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7365106Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7365164Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7366355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7366396Z fx_g = _create_graph( 2025-09-07T07:34:42.7366432Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7366668Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7366704Z fx_g = make_fx( 2025-09-07T07:34:42.7366739Z ^^^^^^^^ 2025-09-07T07:34:42.7366937Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7366983Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7367022Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7367168Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7367213Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7367250Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7367407Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7367446Z t = dispatch_trace( 2025-09-07T07:34:42.7367479Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7367593Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7367637Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7367673Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7367796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7367837Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7367898Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7368064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7369174Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7369216Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7369340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7369379Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7369413Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7369627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7369670Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7369704Z ^^^^^^^^^ 2025-09-07T07:34:42.7369840Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7369879Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7369915Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7370067Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7370117Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7370150Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7370307Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7370368Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7370414Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7370589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7370629Z outs_pair = fn(*args) 2025-09-07T07:34:42.7370664Z ^^^^^^^^^ 2025-09-07T07:34:42.7370835Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7370922Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7371933Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7372109Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7372147Z outs_pair = fn(*args) 2025-09-07T07:34:42.7372182Z ^^^^^^^^^ 2025-09-07T07:34:42.7372361Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7372421Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7372464Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7372660Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7372734Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7372779Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7372952Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7372990Z outs_pair = fn(*args) 2025-09-07T07:34:42.7373076Z ^^^^^^^^^ 2025-09-07T07:34:42.7373269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7373314Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7373350Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7373577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7373624Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7373660Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7373788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7373829Z return handle_torch_function( 2025-09-07T07:34:42.7374875Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7375018Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7375216Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7375262Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7375431Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7375471Z return func(*args, **kwargs) 2025-09-07T07:34:42.7375507Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7375631Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7375672Z result = _engine_run_backward( 2025-09-07T07:34:42.7375707Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7375853Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7375972Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7376024Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7376150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7376192Z return user_fn(self, *args) 2025-09-07T07:34:42.7376229Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7376374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7376432Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7376468Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7376697Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7376743Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7376779Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7377932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7377973Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7378009Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7378174Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7378225Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7378267Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7378403Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7378452Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7378490Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7378652Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7378699Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7378740Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7378897Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7378936Z t = dispatch_trace( 2025-09-07T07:34:42.7379001Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7379115Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7379157Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7379194Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7379368Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7379407Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7379442Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7380603Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7380777Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7380819Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7380944Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7380983Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7381017Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7381144Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7381184Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7381218Z ^^^^^^^^^ 2025-09-07T07:34:42.7381368Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7381416Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7381454Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7381496Z File "", line 1, in 2025-09-07T07:34:42.7381639Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7381715Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7381762Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7381943Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7382010Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7382048Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7382240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7382283Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7382319Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7383462Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7383509Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7383545Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7383690Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7383735Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7383771Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7383904Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7383992Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7384037Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7384162Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7384224Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7384267Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7384409Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7384449Z leaves = list(leaves) 2025-09-07T07:34:42.7384483Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7384609Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7384645Z return func(x) 2025-09-07T07:34:42.7384677Z ^^^^^^^ 2025-09-07T07:34:42.7384816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7384881Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7384923Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7385120Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7386170Z return func(*args, **kwargs) 2025-09-07T07:34:42.7386206Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7386387Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7386472Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7386475Z 2025-09-07T07:34:42.7386754Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7386757Z 2025-09-07T07:34:42.7386759Z 2025-09-07T07:34:42.7386831Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7387062Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7387065Z 2025-09-07T07:34:42.7387151Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7387228Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7387264Z inline_call [] 2025-09-07T07:34:42.7387322Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7387386Z inductor [] 2025-09-07T07:34:42.7387461Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7387532Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7387790Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7387905Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7387958Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7388109Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7388196Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7388326Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7388447Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7389539Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7389575Z inline_call [] 2025-09-07T07:34:42.7389630Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7389703Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7389780Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7390088Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7390224Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7390275Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7390426Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7390512Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7390639Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7390758Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7390845Z =================================== FAILURES =================================== 2025-09-07T07:34:42.7390953Z _ WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.7391118Z Traceback (most recent call last): 2025-09-07T07:34:42.7391309Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1489, in test_while_loop_with_sym_expr_cond 2025-09-07T07:34:42.7391345Z self._run_test( 2025-09-07T07:34:42.7391459Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7391513Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7391553Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7391728Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7392810Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7392851Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7393003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7393049Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7393087Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7393225Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7393290Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7393328Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7393471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7393552Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7393591Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7393743Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7393790Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7393940Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7393994Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7394035Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7394180Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7394280Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7394319Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7394435Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7394499Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7394544Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7395674Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7395740Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7395798Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7395939Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7395985Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7396023Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7396162Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7396201Z return aot_autograd( 2025-09-07T07:34:42.7396236Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7396373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7396469Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7396586Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7396747Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7396831Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7396878Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7397060Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7397103Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7397290Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7397332Z fx_g = _create_graph( 2025-09-07T07:34:42.7397367Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7397531Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7398594Z fx_g = make_fx( 2025-09-07T07:34:42.7398630Z ^^^^^^^^ 2025-09-07T07:34:42.7398782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7398866Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7398904Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7399051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7399093Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7399130Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7399289Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7399328Z t = dispatch_trace( 2025-09-07T07:34:42.7399361Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7399519Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7399561Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7399597Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7399726Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7399766Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7399801Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7399964Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7400042Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7400084Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7400312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7400350Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7401415Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7401569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7401611Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7401647Z ^^^^^^^^^ 2025-09-07T07:34:42.7401816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7401856Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7401892Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7402040Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7402107Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7402158Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7402316Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7402377Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7402421Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7402596Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7402638Z outs_pair = fn(*args) 2025-09-07T07:34:42.7402672Z ^^^^^^^^^ 2025-09-07T07:34:42.7402846Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7402913Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7402958Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7403133Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7403172Z outs_pair = fn(*args) 2025-09-07T07:34:42.7403206Z ^^^^^^^^^ 2025-09-07T07:34:42.7404397Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7404481Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7404524Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7404717Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7404788Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7404833Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7405010Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7405048Z outs_pair = fn(*args) 2025-09-07T07:34:42.7405082Z ^^^^^^^^^ 2025-09-07T07:34:42.7405273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7405321Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7405357Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7405525Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7405571Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7405607Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7405735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7405781Z return handle_torch_function( 2025-09-07T07:34:42.7405818Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7405960Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7406049Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7406094Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7407362Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7407404Z return func(*args, **kwargs) 2025-09-07T07:34:42.7407441Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7407565Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7407608Z result = _engine_run_backward( 2025-09-07T07:34:42.7407668Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7407884Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7408003Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7408054Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7408182Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7408223Z return user_fn(self, *args) 2025-09-07T07:34:42.7408259Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7408403Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7408446Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7408483Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7408690Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7408735Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7408771Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7408896Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7408937Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7408991Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7409158Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7410216Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7410256Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7410392Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7410445Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7410484Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7410647Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7410695Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7410784Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7410943Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7410984Z t = dispatch_trace( 2025-09-07T07:34:42.7411018Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7411131Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7411173Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7411209Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7411334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7411374Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7411409Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7411592Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7411671Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7411715Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7411838Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7412891Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7412927Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7413056Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7413097Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7413150Z ^^^^^^^^^ 2025-09-07T07:34:42.7413313Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7413362Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7413396Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7413438Z File "", line 1, in 2025-09-07T07:34:42.7413582Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7413706Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7413752Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7413887Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7413935Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7413972Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7414166Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7414209Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7414245Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7414418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7414477Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7414513Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7414656Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7415661Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7415698Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7415873Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7415965Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7416010Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7416139Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7416198Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7416245Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7416371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7416410Z leaves = list(leaves) 2025-09-07T07:34:42.7416444Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7416639Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7416674Z return func(x) 2025-09-07T07:34:42.7416708Z ^^^^^^^ 2025-09-07T07:34:42.7416849Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7416913Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7416955Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7417144Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7417188Z return func(*args, **kwargs) 2025-09-07T07:34:42.7417268Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7417448Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7417532Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7417535Z 2025-09-07T07:34:42.7418742Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7418768Z 2025-09-07T07:34:42.7418770Z 2025-09-07T07:34:42.7418845Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7419038Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7419041Z 2025-09-07T07:34:42.7419128Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7419201Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7419237Z inline_call [] 2025-09-07T07:34:42.7419294Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7419327Z inductor [] 2025-09-07T07:34:42.7419402Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7419473Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7419732Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7419847Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7419898Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7420048Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7420154Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7420285Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7420406Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7420479Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7420515Z inline_call [] 2025-09-07T07:34:42.7420569Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7420688Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7421730Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7421987Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7422102Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7422152Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7422301Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7422386Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7422519Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7422693Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7422780Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7422816Z inline_call [] 2025-09-07T07:34:42.7422872Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7422944Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7423012Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7423267Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7423446Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1037, in forward 2025-09-07T07:34:42.7423496Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7423644Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7423730Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7423857Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7423978Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7424236Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-0080867fd524e935.xml - 2025-09-07T07:34:42.7425353Z =========================== short test summary info ============================ 2025-09-07T07:34:42.7425773Z FAILED [0.7303s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7425860Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7425862Z 2025-09-07T07:34:42.7426069Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7426090Z 2025-09-07T07:34:42.7426092Z 2025-09-07T07:34:42.7426164Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7426355Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7426359Z 2025-09-07T07:34:42.7426560Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7426621Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.7426688Z ================== 1 failed, 245 deselected, 2 rerun in 2.59s ================== 2025-09-07T07:34:42.7426723Z Got exit code 1 2025-09-07T07:34:42.7426848Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-09-07T07:34:42.7427267Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.7427309Z import pkg_resources 2025-09-07T07:34:42.7427478Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-dbe75ef0fda7903f.xml 2025-09-07T07:34:42.7427537Z ============================= test session starts ============================== 2025-09-07T07:34:42.7427649Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.7427689Z cachedir: .pytest_cache 2025-09-07T07:34:42.7427863Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.7427909Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.7427947Z configfile: pytest.ini 2025-09-07T07:34:42.7428108Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.7428182Z collecting ... collected 467 items / 76 deselected / 391 selected 2025-09-07T07:34:42.7429264Z stepcurrent: skipping 76 already run items. 2025-09-07T07:34:42.7429307Z Running 170 items in this shard 2025-09-07T07:34:42.7429309Z 2025-09-07T07:34:42.7429525Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cuda_dynamic_False_autograd_False PASSED [1.7644s] [ 0%] 2025-09-07T07:34:42.7429696Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cuda_dynamic_True_autograd_False PASSED [2.2049s] [ 1%] 2025-09-07T07:34:42.7429883Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_False PASSED [0.6586s] [ 1%] 2025-09-07T07:34:42.7430095Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.2705s] [ 2%] 2025-09-07T07:34:42.7430301Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.2485s] [ 2%] 2025-09-07T07:34:42.7430530Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True FAILED [0.2381s] [ 2%] 2025-09-07T07:34:42.7430535Z 2025-09-07T07:34:42.7430585Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.7430708Z _ WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.7430751Z Traceback (most recent call last): 2025-09-07T07:34:42.7430916Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1441, in test_while_loop_with_unbacked_symint_closure 2025-09-07T07:34:42.7430973Z self._run_test( 2025-09-07T07:34:42.7431085Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7431142Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7431182Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7431319Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7431369Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7431410Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7431562Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7431609Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7432660Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7432803Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7432847Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7432885Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7433029Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7433110Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7433150Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7433305Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7433351Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7433566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7433620Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7433662Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7433804Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7433855Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7433893Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7434009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7434109Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7434154Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7434281Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7434345Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7434386Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7435492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7435538Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7435575Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7435749Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7435789Z return aot_autograd( 2025-09-07T07:34:42.7435826Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7435962Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7436032Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7436077Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7436239Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7436386Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7436432Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7436681Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7436725Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7436912Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7436954Z fx_g = _create_graph( 2025-09-07T07:34:42.7436990Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7437155Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7437189Z fx_g = make_fx( 2025-09-07T07:34:42.7437222Z ^^^^^^^^ 2025-09-07T07:34:42.7437376Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7437422Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7438480Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7438630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7438672Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7438710Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7438870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7438908Z t = dispatch_trace( 2025-09-07T07:34:42.7438941Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7439125Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7439167Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7439206Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7439332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7439371Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7439407Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7439569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7439672Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7439729Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7439855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7439893Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7439930Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7440057Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7440102Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7440136Z ^^^^^^^^^ 2025-09-07T07:34:42.7441371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7441412Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7441448Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7441596Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7441650Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7441684Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7441842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7441904Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7441950Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7442155Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7442240Z outs_pair = fn(*args) 2025-09-07T07:34:42.7442275Z ^^^^^^^^^ 2025-09-07T07:34:42.7442449Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7442519Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7442565Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7442740Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7442780Z outs_pair = fn(*args) 2025-09-07T07:34:42.7442815Z ^^^^^^^^^ 2025-09-07T07:34:42.7443031Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7443092Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7443134Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7444348Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7444420Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7444469Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7444642Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7444681Z outs_pair = fn(*args) 2025-09-07T07:34:42.7444732Z ^^^^^^^^^ 2025-09-07T07:34:42.7444923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7444972Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7445009Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7445178Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7445225Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7445262Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7445416Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7445458Z return handle_torch_function( 2025-09-07T07:34:42.7445495Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7445637Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7445712Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7445759Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7445927Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7445968Z return func(*args, **kwargs) 2025-09-07T07:34:42.7446004Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7446128Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7447255Z result = _engine_run_backward( 2025-09-07T07:34:42.7447293Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7447442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7447566Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7447616Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7447769Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7447812Z return user_fn(self, *args) 2025-09-07T07:34:42.7447848Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7447992Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7448038Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7448075Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7448233Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7448277Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7448316Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7448439Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7448482Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7448517Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7448683Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7448734Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7448774Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7448911Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7449963Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7450003Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7450186Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7450234Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7450276Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7450433Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7450472Z t = dispatch_trace( 2025-09-07T07:34:42.7450505Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7450618Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7450660Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7450714Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7450860Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7450900Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7450935Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7451095Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7451174Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7451215Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7451340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7451377Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7451462Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7451590Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7451633Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7452682Z ^^^^^^^^^ 2025-09-07T07:34:42.7452833Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7452884Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7452918Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7452959Z File "", line 1, in 2025-09-07T07:34:42.7453207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7453284Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7453330Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7453466Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7453516Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7453556Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7453748Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7453792Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7453828Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7454051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7454098Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7454134Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7454277Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7454319Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7454354Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7454492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7454582Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7455647Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7455791Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7455855Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7455898Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7456026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7456116Z leaves = list(leaves) 2025-09-07T07:34:42.7456152Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7456274Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7456324Z return func(x) 2025-09-07T07:34:42.7456370Z ^^^^^^^ 2025-09-07T07:34:42.7456567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7456631Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7456674Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7456841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7456884Z return func(*args, **kwargs) 2025-09-07T07:34:42.7456919Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7457101Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7457187Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7457190Z 2025-09-07T07:34:42.7457398Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7457401Z 2025-09-07T07:34:42.7457403Z 2025-09-07T07:34:42.7457474Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7457685Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.7457711Z 2025-09-07T07:34:42.7458874Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7458951Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7458986Z inline_call [] 2025-09-07T07:34:42.7459043Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7459116Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7459193Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7459449Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7459564Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7459614Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7459768Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7459853Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7459984Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7460103Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7460228Z _ WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.7460271Z Traceback (most recent call last): 2025-09-07T07:34:42.7460460Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1441, in test_while_loop_with_unbacked_symint_closure 2025-09-07T07:34:42.7460496Z self._run_test( 2025-09-07T07:34:42.7460609Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7460665Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7460705Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7460836Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7461902Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7461943Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7462136Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7462183Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7462221Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7462359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7462402Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7462442Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7462628Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7462710Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7462747Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7462900Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7462946Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7463097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7463150Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7463191Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7463367Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7463434Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7463473Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7463589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7463653Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7463697Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7464792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7464857Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7464898Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7465040Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7465083Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7465123Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7465260Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7465299Z return aot_autograd( 2025-09-07T07:34:42.7465334Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7465469Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7465539Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7465586Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7465746Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7465906Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7465952Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7466136Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7466180Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7466366Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7466406Z fx_g = _create_graph( 2025-09-07T07:34:42.7466455Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7466713Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7466748Z fx_g = make_fx( 2025-09-07T07:34:42.7467848Z ^^^^^^^^ 2025-09-07T07:34:42.7468001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7468047Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7468085Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7468232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7468274Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7468311Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7468472Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7468510Z t = dispatch_trace( 2025-09-07T07:34:42.7468545Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7468660Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7468701Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7468739Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7468863Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7468927Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7468962Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7469168Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7469248Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7469288Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7469416Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7469454Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7470542Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7470670Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7470714Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7470748Z ^^^^^^^^^ 2025-09-07T07:34:42.7470885Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7470925Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7470961Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7471111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7471161Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7471195Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7471404Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7471466Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7471510Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7471711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7471753Z outs_pair = fn(*args) 2025-09-07T07:34:42.7471787Z ^^^^^^^^^ 2025-09-07T07:34:42.7471959Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7472027Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7472071Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7472320Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7472359Z outs_pair = fn(*args) 2025-09-07T07:34:42.7472393Z ^^^^^^^^^ 2025-09-07T07:34:42.7473587Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7473648Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7473692Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7473886Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7473956Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7474000Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7474174Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7474213Z outs_pair = fn(*args) 2025-09-07T07:34:42.7474247Z ^^^^^^^^^ 2025-09-07T07:34:42.7474437Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7474482Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7474538Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7474708Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7474754Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7474791Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7474917Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7474960Z return handle_torch_function( 2025-09-07T07:34:42.7474998Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7475140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7475214Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7475259Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7476436Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7476576Z return func(*args, **kwargs) 2025-09-07T07:34:42.7476613Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7476737Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7476779Z result = _engine_run_backward( 2025-09-07T07:34:42.7476814Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7476964Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7477085Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7477160Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7477336Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7477381Z return user_fn(self, *args) 2025-09-07T07:34:42.7477417Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7477562Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7477605Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7477641Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7477799Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7477929Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7477965Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7478089Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7478129Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7478166Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7478333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7479362Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7479405Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7479542Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7479591Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7479630Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7479837Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7479884Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7479925Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7480085Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7480217Z t = dispatch_trace( 2025-09-07T07:34:42.7480251Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7480365Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7480407Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7480443Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7480566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7480607Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7480643Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7480804Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7480883Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7480924Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7481052Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7482124Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7482159Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7482287Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7482329Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7482363Z ^^^^^^^^^ 2025-09-07T07:34:42.7482515Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7482563Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7482598Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7482639Z File "", line 1, in 2025-09-07T07:34:42.7482799Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7482877Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7482928Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7483107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7483155Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7483193Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7483398Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7483453Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7483489Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7483697Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7483743Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7483782Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7483925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7485137Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7485176Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7485311Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7485402Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7485450Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7485574Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7485637Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7485680Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7485876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7485915Z leaves = list(leaves) 2025-09-07T07:34:42.7485948Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7486070Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7486106Z return func(x) 2025-09-07T07:34:42.7486139Z ^^^^^^^ 2025-09-07T07:34:42.7486278Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7486344Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7486386Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7486615Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7486657Z return func(*args, **kwargs) 2025-09-07T07:34:42.7486695Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7486875Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7486959Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7486961Z 2025-09-07T07:34:42.7488207Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7488211Z 2025-09-07T07:34:42.7488213Z 2025-09-07T07:34:42.7488286Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7488575Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.7488578Z 2025-09-07T07:34:42.7488665Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7488742Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7488777Z inline_call [] 2025-09-07T07:34:42.7488835Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7488907Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7488979Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7489254Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7489388Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7489439Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7489591Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7489677Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7489862Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7489981Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7490053Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7490088Z inline_call [] 2025-09-07T07:34:42.7490145Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7490216Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7490286Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7491567Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7491707Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7491757Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7491910Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7491995Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7492128Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7492248Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7492299Z =================================== FAILURES =================================== 2025-09-07T07:34:42.7492422Z _ WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.7492466Z Traceback (most recent call last): 2025-09-07T07:34:42.7492629Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1441, in test_while_loop_with_unbacked_symint_closure 2025-09-07T07:34:42.7492664Z self._run_test( 2025-09-07T07:34:42.7492776Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7492831Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7492870Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7493005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7493051Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7493090Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7493256Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7493303Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7493344Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7494580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7494626Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7494664Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7494805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7494920Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7494959Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7495113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7495160Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7495311Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7495365Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7495405Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7495548Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7495598Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7495637Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7495753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7495819Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7495863Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7496040Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7496103Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7496162Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7496301Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7496345Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7497460Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7497602Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7497644Z return aot_autograd( 2025-09-07T07:34:42.7497682Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7497817Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7497887Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7497932Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7498095Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7498180Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7498225Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7498406Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7498450Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7498636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7498677Z fx_g = _create_graph( 2025-09-07T07:34:42.7498760Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7498955Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7498992Z fx_g = make_fx( 2025-09-07T07:34:42.7499025Z ^^^^^^^^ 2025-09-07T07:34:42.7499177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7499222Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7499261Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7499406Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7500528Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7500565Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7500724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7500763Z t = dispatch_trace( 2025-09-07T07:34:42.7500798Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7500911Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7500953Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7500988Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7501113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7501153Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7501189Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7501352Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7501585Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7501627Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7501754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7501792Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7501826Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7501975Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7502016Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7502052Z ^^^^^^^^^ 2025-09-07T07:34:42.7502184Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7503199Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7503236Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7503388Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7503437Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7503471Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7503629Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7503694Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7503738Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7503915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7503954Z outs_pair = fn(*args) 2025-09-07T07:34:42.7503989Z ^^^^^^^^^ 2025-09-07T07:34:42.7504205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7504277Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7504321Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7504512Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7504551Z outs_pair = fn(*args) 2025-09-07T07:34:42.7504587Z ^^^^^^^^^ 2025-09-07T07:34:42.7504766Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7504825Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7504867Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7505062Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7505159Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7506223Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7506397Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7506438Z outs_pair = fn(*args) 2025-09-07T07:34:42.7506472Z ^^^^^^^^^ 2025-09-07T07:34:42.7506736Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7506780Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7506862Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7507031Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7507079Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7507117Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7507241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7507284Z return handle_torch_function( 2025-09-07T07:34:42.7507321Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7507463Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7507562Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7507608Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7507778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7507820Z return func(*args, **kwargs) 2025-09-07T07:34:42.7507857Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7507984Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7508024Z result = _engine_run_backward( 2025-09-07T07:34:42.7509081Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7509229Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7509351Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7509403Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7509530Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7509615Z return user_fn(self, *args) 2025-09-07T07:34:42.7509651Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7509796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7509840Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7509876Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7510034Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7510102Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7510140Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7510264Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7510304Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7510339Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7510503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7510556Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7510612Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7510767Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7510817Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7510857Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7512025Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7512075Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7512113Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7512272Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7512310Z t = dispatch_trace( 2025-09-07T07:34:42.7512345Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7512456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7512502Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7512537Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7512661Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7512701Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7512737Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7512945Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7513043Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7513083Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7513207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7513245Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7513281Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7513409Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7513451Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7513485Z ^^^^^^^^^ 2025-09-07T07:34:42.7514589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7514638Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7514675Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7514717Z File "", line 1, in 2025-09-07T07:34:42.7514907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7514986Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7515031Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7515169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7515217Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7515255Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7515464Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7515509Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7515548Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7515719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7515763Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7515799Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7515941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7515998Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7516045Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7516228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7516315Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7516362Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7517577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7517639Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7517682Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7517808Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7517846Z leaves = list(leaves) 2025-09-07T07:34:42.7517881Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7518010Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7518045Z return func(x) 2025-09-07T07:34:42.7518077Z ^^^^^^^ 2025-09-07T07:34:42.7518216Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7518282Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7518350Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7518518Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7518558Z return func(*args, **kwargs) 2025-09-07T07:34:42.7518594Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7518775Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7518912Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7518916Z 2025-09-07T07:34:42.7519122Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7519125Z 2025-09-07T07:34:42.7519128Z 2025-09-07T07:34:42.7519200Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7519409Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.7519413Z 2025-09-07T07:34:42.7519498Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7520611Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7520648Z inline_call [] 2025-09-07T07:34:42.7520705Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7520781Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7520852Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7521134Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7521252Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7521305Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7521460Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7521546Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7521677Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7521839Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7521911Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7521946Z inline_call [] 2025-09-07T07:34:42.7522002Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7522075Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7522145Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7522401Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7522512Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7522562Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7522768Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7522854Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7523959Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7524125Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7524214Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7524249Z inline_call [] 2025-09-07T07:34:42.7524304Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7524377Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7524446Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7524701Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7524812Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7524861Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7525011Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7525098Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7525227Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7525344Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7525605Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-dbe75ef0fda7903f.xml - 2025-09-07T07:34:42.7525666Z =========================== short test summary info ============================ 2025-09-07T07:34:42.7526056Z FAILED [0.2381s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7526142Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7526144Z 2025-09-07T07:34:42.7526348Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7526351Z 2025-09-07T07:34:42.7526353Z 2025-09-07T07:34:42.7526425Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7526720Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.7526740Z 2025-09-07T07:34:42.7527851Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7527914Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.7527983Z ============= 1 failed, 3 passed, 76 deselected, 2 rerun in 5.56s ============== 2025-09-07T07:34:42.7528022Z Got exit code 1 2025-09-07T07:34:42.7528060Z Retrying single test... 2025-09-07T07:34:42.7528491Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.7528530Z import pkg_resources 2025-09-07T07:34:42.7528704Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-4440f4be1364e522.xml 2025-09-07T07:34:42.7528760Z ============================= test session starts ============================== 2025-09-07T07:34:42.7528876Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.7528915Z cachedir: .pytest_cache 2025-09-07T07:34:42.7529072Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.7529178Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.7529217Z configfile: pytest.ini 2025-09-07T07:34:42.7529379Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.7529455Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.7529702Z stepcurrent: skipping 79 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.7529746Z Running 1 items in this shard 2025-09-07T07:34:42.7529748Z 2025-09-07T07:34:42.7529959Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.3997s] [100%] 2025-09-07T07:34:42.7530167Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.2306s] [100%] 2025-09-07T07:34:42.7530354Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True FAILED [0.2265s] [100%] 2025-09-07T07:34:42.7530356Z 2025-09-07T07:34:42.7531426Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.7531553Z _ WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.7531597Z Traceback (most recent call last): 2025-09-07T07:34:42.7531760Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1441, in test_while_loop_with_unbacked_symint_closure 2025-09-07T07:34:42.7531816Z self._run_test( 2025-09-07T07:34:42.7531974Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7532034Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7532075Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7532211Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7532257Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7532296Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7532448Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7532520Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7532561Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7532698Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7532743Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7532781Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7532927Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7533007Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7533047Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7533198Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7533245Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7534407Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7534463Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7534503Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7534705Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7534776Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7534815Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7534930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7534996Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7535039Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7535165Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7535231Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7535274Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7535414Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7535460Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7535496Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7535637Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7535677Z return aot_autograd( 2025-09-07T07:34:42.7535712Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7535848Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7535918Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7535967Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7536127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7537306Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7537378Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7537567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7537611Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7537796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7537836Z fx_g = _create_graph( 2025-09-07T07:34:42.7537871Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7538070Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7538106Z fx_g = make_fx( 2025-09-07T07:34:42.7538139Z ^^^^^^^^ 2025-09-07T07:34:42.7538337Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7538383Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7538421Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7538571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7538614Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7538650Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7538809Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7538847Z t = dispatch_trace( 2025-09-07T07:34:42.7538882Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7538998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7539039Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7540041Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7540171Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7540211Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7540276Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7540487Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7540566Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7540608Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7540733Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7540776Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7540810Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7540936Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7540977Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7541014Z ^^^^^^^^^ 2025-09-07T07:34:42.7541148Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7541191Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7541226Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7541375Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7541424Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7541458Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7541616Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7541681Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7541725Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7542889Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7542930Z outs_pair = fn(*args) 2025-09-07T07:34:42.7542967Z ^^^^^^^^^ 2025-09-07T07:34:42.7543139Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7543204Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7543248Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7543423Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7543490Z outs_pair = fn(*args) 2025-09-07T07:34:42.7543524Z ^^^^^^^^^ 2025-09-07T07:34:42.7543703Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7543763Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7543806Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7543999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7544069Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7544114Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7544288Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7544327Z outs_pair = fn(*args) 2025-09-07T07:34:42.7544362Z ^^^^^^^^^ 2025-09-07T07:34:42.7544550Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7544597Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7544633Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7545881Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7545929Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7545966Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7546091Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7546134Z return handle_torch_function( 2025-09-07T07:34:42.7546171Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7546315Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7546388Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7546435Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7546712Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7546756Z return func(*args, **kwargs) 2025-09-07T07:34:42.7546791Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7546915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7546957Z result = _engine_run_backward( 2025-09-07T07:34:42.7546993Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7547140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7547265Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7547314Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7547463Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7547507Z return user_fn(self, *args) 2025-09-07T07:34:42.7547544Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7547689Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7548758Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7548796Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7548954Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7549021Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7549074Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7549200Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7549239Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7549276Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7549443Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7549497Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7549583Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7549722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7549771Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7549809Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7549974Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7550021Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7550060Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7550222Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7550259Z t = dispatch_trace( 2025-09-07T07:34:42.7550312Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7550425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7551433Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7551470Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7551594Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7551632Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7551668Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7551831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7551909Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7551998Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7552121Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7552162Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7552196Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7552322Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7552364Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7552399Z ^^^^^^^^^ 2025-09-07T07:34:42.7552548Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7552599Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7552632Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7552675Z File "", line 1, in 2025-09-07T07:34:42.7552833Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7552912Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7552959Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7553096Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7554155Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7554195Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7554386Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7554460Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7554496Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7554668Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7554713Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7554750Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7554894Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7554938Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7554973Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7555106Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7555194Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7555241Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7555367Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7555426Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7555471Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7555597Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7555653Z leaves = list(leaves) 2025-09-07T07:34:42.7555687Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7555810Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7555844Z return func(x) 2025-09-07T07:34:42.7556915Z ^^^^^^^ 2025-09-07T07:34:42.7557055Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7557123Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7557164Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7557332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7557373Z return func(*args, **kwargs) 2025-09-07T07:34:42.7557410Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7557594Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7557680Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7557682Z 2025-09-07T07:34:42.7557935Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7557938Z 2025-09-07T07:34:42.7557941Z 2025-09-07T07:34:42.7558015Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7558223Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.7558225Z 2025-09-07T07:34:42.7558335Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7558409Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7558447Z inline_call [] 2025-09-07T07:34:42.7558503Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7558538Z inductor [] 2025-09-07T07:34:42.7558611Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7558683Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7558957Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7559090Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7560235Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7560393Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7560481Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7560668Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7560787Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7560909Z _ WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.7560952Z Traceback (most recent call last): 2025-09-07T07:34:42.7561116Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1441, in test_while_loop_with_unbacked_symint_closure 2025-09-07T07:34:42.7561151Z self._run_test( 2025-09-07T07:34:42.7561262Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7561318Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7561359Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7561522Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7561569Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7561607Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7561757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7561804Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7561844Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7562029Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7562073Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7562112Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7562254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7563360Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7563399Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7563553Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7563598Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7563750Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7563805Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7563846Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7563986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7564054Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7564093Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7564212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7564278Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7564322Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7564447Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7564510Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7564633Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7564774Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7564817Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7564855Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7564993Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7565033Z return aot_autograd( 2025-09-07T07:34:42.7566087Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7566225Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7566294Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7566340Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7566576Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7566661Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7566706Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7566890Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7566959Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7567144Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7567184Z fx_g = _create_graph( 2025-09-07T07:34:42.7567218Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7567427Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7567463Z fx_g = make_fx( 2025-09-07T07:34:42.7567499Z ^^^^^^^^ 2025-09-07T07:34:42.7567650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7567696Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7567736Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7567881Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7567925Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7567962Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7568121Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7569183Z t = dispatch_trace( 2025-09-07T07:34:42.7569218Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7569332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7569376Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7569412Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7569536Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7569598Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7569635Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7569798Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7569878Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7569918Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7570043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7570081Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7570116Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7570324Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7570367Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7570402Z ^^^^^^^^^ 2025-09-07T07:34:42.7570537Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7570576Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7570614Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7570764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7571840Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7571875Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7572033Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7572098Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7572143Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7572317Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7572356Z outs_pair = fn(*args) 2025-09-07T07:34:42.7572392Z ^^^^^^^^^ 2025-09-07T07:34:42.7572565Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7572649Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7572693Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7572867Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7572906Z outs_pair = fn(*args) 2025-09-07T07:34:42.7572941Z ^^^^^^^^^ 2025-09-07T07:34:42.7573122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7573181Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7573227Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7573421Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7573492Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7573537Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7573709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7573747Z outs_pair = fn(*args) 2025-09-07T07:34:42.7574743Z ^^^^^^^^^ 2025-09-07T07:34:42.7574934Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7574980Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7575016Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7575201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7575250Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7575286Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7575411Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7575452Z return handle_torch_function( 2025-09-07T07:34:42.7575488Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7575628Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7575735Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7575781Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7575949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7575988Z return func(*args, **kwargs) 2025-09-07T07:34:42.7576024Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7576150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7576193Z result = _engine_run_backward( 2025-09-07T07:34:42.7576228Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7576373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7576607Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7577633Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7577762Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7577805Z return user_fn(self, *args) 2025-09-07T07:34:42.7577840Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7577985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7578058Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7578095Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7578253Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7578298Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7578333Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7578459Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7578499Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7578581Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7578748Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7578799Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7578842Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7578977Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7579027Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7579065Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7579228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7579277Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7579316Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7580485Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7580525Z t = dispatch_trace( 2025-09-07T07:34:42.7580582Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7580696Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7580740Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7580776Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7580899Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7580938Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7580973Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7581133Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7581256Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7581298Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7581421Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7581460Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7581494Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7581619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7581661Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7581695Z ^^^^^^^^^ 2025-09-07T07:34:42.7581846Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7581895Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7581930Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7581972Z File "", line 1, in 2025-09-07T07:34:42.7583130Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7583209Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7583257Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7583393Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7583461Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7583499Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7583690Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7583733Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7583772Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7583944Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7583988Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7584024Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7584170Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7584262Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7584299Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7584433Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7584521Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7584566Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7584694Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7584753Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7584795Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7585899Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7585940Z leaves = list(leaves) 2025-09-07T07:34:42.7585974Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7586100Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7586135Z return func(x) 2025-09-07T07:34:42.7586168Z ^^^^^^^ 2025-09-07T07:34:42.7586306Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7586369Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7586425Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7586697Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7586739Z return func(*args, **kwargs) 2025-09-07T07:34:42.7586774Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7586957Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7587088Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7587091Z 2025-09-07T07:34:42.7587297Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7587299Z 2025-09-07T07:34:42.7587302Z 2025-09-07T07:34:42.7587373Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7587585Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.7587588Z 2025-09-07T07:34:42.7587673Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7587750Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7587784Z inline_call [] 2025-09-07T07:34:42.7587841Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7587895Z inductor [] 2025-09-07T07:34:42.7588955Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7589027Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7589332Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7589449Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7589500Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7589654Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7589742Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7589874Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7589995Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7590065Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7590100Z inline_call [] 2025-09-07T07:34:42.7590156Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7590228Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7590299Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7590552Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7590685Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7590738Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7590886Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7590971Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7591100Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7591218Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7592271Z =================================== FAILURES =================================== 2025-09-07T07:34:42.7592396Z _ WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.7592439Z Traceback (most recent call last): 2025-09-07T07:34:42.7592650Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1441, in test_while_loop_with_unbacked_symint_closure 2025-09-07T07:34:42.7592686Z self._run_test( 2025-09-07T07:34:42.7592799Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7592854Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7592895Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7593026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7593073Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7593113Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7593265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7593310Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7593350Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7593487Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7593548Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7593587Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7593729Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7593810Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7593848Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7594001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7594046Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7595166Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7595219Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7595259Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7595404Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7595455Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7595493Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7595610Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7595674Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7595722Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7595847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7595911Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7595967Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7596108Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7596154Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7596192Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7596329Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7596368Z return aot_autograd( 2025-09-07T07:34:42.7596403Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7596619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7596704Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7596749Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7597935Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7598020Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7598066Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7598249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7598292Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7598477Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7598519Z fx_g = _create_graph( 2025-09-07T07:34:42.7598553Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7598719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7598754Z fx_g = make_fx( 2025-09-07T07:34:42.7598833Z ^^^^^^^^ 2025-09-07T07:34:42.7598986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7599057Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7599095Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7599241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7599283Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7599320Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7599480Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7599518Z t = dispatch_trace( 2025-09-07T07:34:42.7599551Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7599665Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7599706Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7600833Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7600962Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7601002Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7601038Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7601200Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7601277Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7601321Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7601445Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7601483Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7601517Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7601666Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7601710Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7601851Z ^^^^^^^^^ 2025-09-07T07:34:42.7601985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7602025Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7602061Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7602211Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7602299Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7602333Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7602491Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7602552Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7603622Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7603800Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7603839Z outs_pair = fn(*args) 2025-09-07T07:34:42.7603874Z ^^^^^^^^^ 2025-09-07T07:34:42.7604045Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7604111Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7604157Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7604331Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7604370Z outs_pair = fn(*args) 2025-09-07T07:34:42.7604405Z ^^^^^^^^^ 2025-09-07T07:34:42.7604638Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7604717Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7604760Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7604952Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7605022Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7605070Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7605245Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7605284Z outs_pair = fn(*args) 2025-09-07T07:34:42.7605317Z ^^^^^^^^^ 2025-09-07T07:34:42.7605509Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7605554Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7605591Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7606958Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7607008Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7607045Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7607175Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7607217Z return handle_torch_function( 2025-09-07T07:34:42.7607253Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7607422Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7607497Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7607544Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7607712Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7607752Z return func(*args, **kwargs) 2025-09-07T07:34:42.7607837Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7607960Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7608022Z result = _engine_run_backward( 2025-09-07T07:34:42.7608076Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7608224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7608346Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7608397Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7608527Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7608568Z return user_fn(self, *args) 2025-09-07T07:34:42.7608604Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7609792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7609836Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7609874Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7610033Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7610077Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7610113Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7610237Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7610303Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7610338Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7610550Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7610602Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7610643Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7610778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7610831Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7610869Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7611030Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7611078Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7611117Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7611277Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7611315Z t = dispatch_trace( 2025-09-07T07:34:42.7611349Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7611462Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7612465Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7612504Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7612631Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7612670Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7612704Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7612882Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7612962Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7613005Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7613129Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7613167Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7613201Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7613328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7613385Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7613433Z ^^^^^^^^^ 2025-09-07T07:34:42.7613583Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7613632Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7613666Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7613709Z File "", line 1, in 2025-09-07T07:34:42.7613852Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7613931Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7613977Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7614112Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7615177Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7615216Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7615411Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7615453Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7615491Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7615716Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7615780Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7615816Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7615959Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7616001Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7616037Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7616169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7616261Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7616306Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7616432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7616556Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7616602Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7616729Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7616766Z leaves = list(leaves) 2025-09-07T07:34:42.7616801Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7616924Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7617931Z return func(x) 2025-09-07T07:34:42.7617965Z ^^^^^^^ 2025-09-07T07:34:42.7618106Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7618170Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7618236Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7618403Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7618446Z return func(*args, **kwargs) 2025-09-07T07:34:42.7618482Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7618711Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7618796Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7618798Z 2025-09-07T07:34:42.7619028Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7619047Z 2025-09-07T07:34:42.7619048Z 2025-09-07T07:34:42.7619121Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7619331Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.7619334Z 2025-09-07T07:34:42.7619419Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7619494Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7619528Z inline_call [] 2025-09-07T07:34:42.7619584Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7619618Z inductor [] 2025-09-07T07:34:42.7619692Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7619765Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7620022Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7620138Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7621162Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7621337Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7621423Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7621553Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7621673Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7621747Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7621782Z inline_call [] 2025-09-07T07:34:42.7621837Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7621911Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7621981Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7622239Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7622353Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7622404Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7622552Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7622639Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7622768Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7622907Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7622976Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7623011Z inline_call [] 2025-09-07T07:34:42.7623064Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7623136Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7624227Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7624484Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7624632Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7624683Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7624831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7624914Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7625044Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7625165Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7625381Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-4440f4be1364e522.xml - 2025-09-07T07:34:42.7625440Z =========================== short test summary info ============================ 2025-09-07T07:34:42.7625817Z FAILED [0.2265s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7625902Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7625904Z 2025-09-07T07:34:42.7626126Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7626128Z 2025-09-07T07:34:42.7626130Z 2025-09-07T07:34:42.7626202Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7626409Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.7626413Z 2025-09-07T07:34:42.7626569Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7626628Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.7626695Z ================== 1 failed, 245 deselected, 2 rerun in 1.12s ================== 2025-09-07T07:34:42.7626731Z Got exit code 1 2025-09-07T07:34:42.7626771Z Retrying single test... 2025-09-07T07:34:42.7627239Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.7628266Z import pkg_resources 2025-09-07T07:34:42.7628435Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-003e04e678f1dede.xml 2025-09-07T07:34:42.7628494Z ============================= test session starts ============================== 2025-09-07T07:34:42.7628607Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.7628647Z cachedir: .pytest_cache 2025-09-07T07:34:42.7628826Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.7628873Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.7628914Z configfile: pytest.ini 2025-09-07T07:34:42.7629074Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.7629149Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.7629394Z stepcurrent: skipping 79 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.7629473Z Running 1 items in this shard 2025-09-07T07:34:42.7629476Z 2025-09-07T07:34:42.7629687Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.4295s] [100%] 2025-09-07T07:34:42.7629896Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True ('RERUN', {'yellow': True}) [0.2732s] [100%] 2025-09-07T07:34:42.7630080Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True FAILED [0.2593s] [100%] 2025-09-07T07:34:42.7630083Z 2025-09-07T07:34:42.7630130Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.7630253Z _ WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.7630297Z Traceback (most recent call last): 2025-09-07T07:34:42.7630462Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1441, in test_while_loop_with_unbacked_symint_closure 2025-09-07T07:34:42.7630497Z self._run_test( 2025-09-07T07:34:42.7630611Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7630668Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7631675Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7631835Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7631881Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7631919Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7632071Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7632118Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7632161Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7632298Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7632342Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7632378Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7632522Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7632606Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7632699Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7632852Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7632899Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7633049Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7633105Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7633145Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7633288Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7633352Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7633391Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7634476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7634545Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7634588Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7634716Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7634779Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7634851Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7634992Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7635036Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7635074Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7635212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7635254Z return aot_autograd( 2025-09-07T07:34:42.7635288Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7635423Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7635492Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7635537Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7635699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7635782Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7635827Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7636011Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7636071Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7636256Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7636295Z fx_g = _create_graph( 2025-09-07T07:34:42.7637368Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7637533Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7637571Z fx_g = make_fx( 2025-09-07T07:34:42.7637605Z ^^^^^^^^ 2025-09-07T07:34:42.7637757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7637802Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7637841Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7637986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7638032Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7638068Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7638226Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7638263Z t = dispatch_trace( 2025-09-07T07:34:42.7638296Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7638409Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7638453Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7638489Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7638613Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7638677Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7638713Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7638876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7638957Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7640041Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7640268Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7640307Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7640364Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7640509Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7640552Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7640587Z ^^^^^^^^^ 2025-09-07T07:34:42.7640720Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7640762Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7640797Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7640947Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7640996Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7641030Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7641186Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7641250Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7641297Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7641472Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7641513Z outs_pair = fn(*args) 2025-09-07T07:34:42.7641547Z ^^^^^^^^^ 2025-09-07T07:34:42.7641720Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7641813Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7641858Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7643005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7643046Z outs_pair = fn(*args) 2025-09-07T07:34:42.7643082Z ^^^^^^^^^ 2025-09-07T07:34:42.7643262Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7643320Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7643365Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7643560Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7643633Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7643679Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7643851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7643889Z outs_pair = fn(*args) 2025-09-07T07:34:42.7643976Z ^^^^^^^^^ 2025-09-07T07:34:42.7644169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7644214Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7644250Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7644437Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7644483Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7644521Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7644646Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7644689Z return handle_torch_function( 2025-09-07T07:34:42.7644725Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7645883Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7645996Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7646041Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7646211Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7646252Z return func(*args, **kwargs) 2025-09-07T07:34:42.7646288Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7646413Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7646454Z result = _engine_run_backward( 2025-09-07T07:34:42.7646557Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7646704Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7646825Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7646877Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7647003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7647046Z return user_fn(self, *args) 2025-09-07T07:34:42.7647082Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7647229Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7647299Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7647336Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7647495Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7647542Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7647578Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7647703Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7648718Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7648759Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7648924Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7648976Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7649018Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7649154Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7649203Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7649241Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7649402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7649451Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7649490Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7649648Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7649706Z t = dispatch_trace( 2025-09-07T07:34:42.7649742Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7649855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7649898Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7649934Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7650058Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7650097Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7650131Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7650311Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7650407Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7651414Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7651538Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7651577Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7651611Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7651738Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7651778Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7651813Z ^^^^^^^^^ 2025-09-07T07:34:42.7651962Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7652011Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7652046Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7652089Z File "", line 1, in 2025-09-07T07:34:42.7652231Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7652309Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7652355Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7652491Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7652557Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7652596Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7652786Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7652829Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7652867Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7653038Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7654036Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7654075Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7654220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7654265Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7654301Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7654434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7654522Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7654568Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7654695Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7654754Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7654797Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7654939Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7654978Z leaves = list(leaves) 2025-09-07T07:34:42.7655014Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7655137Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7655171Z return func(x) 2025-09-07T07:34:42.7655205Z ^^^^^^^ 2025-09-07T07:34:42.7655342Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7655408Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7655465Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7655645Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7655686Z return func(*args, **kwargs) 2025-09-07T07:34:42.7656757Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7656941Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7657027Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7657029Z 2025-09-07T07:34:42.7657236Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7657239Z 2025-09-07T07:34:42.7657241Z 2025-09-07T07:34:42.7657313Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7657528Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.7657530Z 2025-09-07T07:34:42.7657616Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7657690Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7657726Z inline_call [] 2025-09-07T07:34:42.7657782Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7657840Z inductor [] 2025-09-07T07:34:42.7657912Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7657983Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7658239Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7658356Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7658407Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7658559Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7658645Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7658827Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7658947Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7659069Z _ WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.7660092Z Traceback (most recent call last): 2025-09-07T07:34:42.7660255Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1441, in test_while_loop_with_unbacked_symint_closure 2025-09-07T07:34:42.7660293Z self._run_test( 2025-09-07T07:34:42.7660406Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7660460Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7660523Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7660656Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7660705Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7660743Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7660893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7660940Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7660978Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7661131Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7661197Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7661235Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7661378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7661459Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7661547Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7661700Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7661744Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7661895Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7661948Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7663012Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7663155Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7663206Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7663244Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7663362Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7663449Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7663493Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7663619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7663681Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7663722Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7663863Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7663906Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7663944Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7664084Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7664124Z return aot_autograd( 2025-09-07T07:34:42.7664210Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7664347Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7664416Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7664461Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7664623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7664707Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7665722Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7665924Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7665968Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7666154Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7666244Z fx_g = _create_graph( 2025-09-07T07:34:42.7666279Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7666442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7666476Z fx_g = make_fx( 2025-09-07T07:34:42.7666587Z ^^^^^^^^ 2025-09-07T07:34:42.7666778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7666825Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7666862Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7667010Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7667052Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7667089Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7667247Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7667286Z t = dispatch_trace( 2025-09-07T07:34:42.7667320Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7667434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7667475Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7667512Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7667638Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7668706Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7668744Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7668908Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7668986Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7669051Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7669175Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7669213Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7669248Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7669374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7669417Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7669453Z ^^^^^^^^^ 2025-09-07T07:34:42.7669585Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7669625Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7669662Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7669813Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7669865Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7669898Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7670056Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7670119Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7670163Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7670342Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7671340Z outs_pair = fn(*args) 2025-09-07T07:34:42.7671376Z ^^^^^^^^^ 2025-09-07T07:34:42.7671617Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7671683Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7671731Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7671905Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7671944Z outs_pair = fn(*args) 2025-09-07T07:34:42.7671977Z ^^^^^^^^^ 2025-09-07T07:34:42.7672156Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7672241Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7672285Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7672481Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7672551Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7672598Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7672771Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7672810Z outs_pair = fn(*args) 2025-09-07T07:34:42.7672843Z ^^^^^^^^^ 2025-09-07T07:34:42.7673036Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7673083Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7673120Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7673289Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7673336Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7674377Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7674525Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7674567Z return handle_torch_function( 2025-09-07T07:34:42.7674604Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7674744Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7674819Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7674867Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7675034Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7675073Z return func(*args, **kwargs) 2025-09-07T07:34:42.7675111Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7675233Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7675278Z result = _engine_run_backward( 2025-09-07T07:34:42.7675313Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7675506Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7675626Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7675676Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7675803Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7675846Z return user_fn(self, *args) 2025-09-07T07:34:42.7675884Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7676043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7676087Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7677202Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7677363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7677407Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7677443Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7677566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7677631Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7677684Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7677850Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7677901Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7677942Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7678122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7678174Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7678212Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7678373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7678420Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7678459Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7678619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7678658Z t = dispatch_trace( 2025-09-07T07:34:42.7678693Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7678809Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7678850Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7678905Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7679996Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7680037Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7680071Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7680298Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7680376Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7680421Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7680544Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7680582Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7680617Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7680743Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7680787Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7680821Z ^^^^^^^^^ 2025-09-07T07:34:42.7680970Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7681018Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7681052Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7681093Z File "", line 1, in 2025-09-07T07:34:42.7681240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7681317Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7681362Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7681520Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7681569Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7681607Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7682799Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7682843Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7682879Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7683049Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7683131Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7683169Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7683313Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7683356Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7683392Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7683525Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7683612Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7683657Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7683782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7683842Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7683886Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7684012Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7684050Z leaves = list(leaves) 2025-09-07T07:34:42.7684086Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7684210Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7684260Z return func(x) 2025-09-07T07:34:42.7684292Z ^^^^^^^ 2025-09-07T07:34:42.7685379Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7685445Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7685486Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7685653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7685697Z return func(*args, **kwargs) 2025-09-07T07:34:42.7685732Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7685912Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7685996Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7685998Z 2025-09-07T07:34:42.7686211Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7686213Z 2025-09-07T07:34:42.7686215Z 2025-09-07T07:34:42.7686287Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7686559Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.7686564Z 2025-09-07T07:34:42.7686651Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7686725Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7686759Z inline_call [] 2025-09-07T07:34:42.7686844Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7686878Z inductor [] 2025-09-07T07:34:42.7686952Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7687024Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7687282Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7687394Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7687462Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7687631Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7688695Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7688828Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7688947Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7689018Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7689052Z inline_call [] 2025-09-07T07:34:42.7689108Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7689180Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7689250Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7689514Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7689626Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7689677Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7689826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7689938Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7690069Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7690187Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7690236Z =================================== FAILURES =================================== 2025-09-07T07:34:42.7690361Z _ WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True _ 2025-09-07T07:34:42.7690404Z Traceback (most recent call last): 2025-09-07T07:34:42.7690567Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1441, in test_while_loop_with_unbacked_symint_closure 2025-09-07T07:34:42.7690602Z self._run_test( 2025-09-07T07:34:42.7690715Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7691724Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7691765Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7691898Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7691944Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7691982Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7692135Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7692181Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7692219Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7692373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7692417Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7692458Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7692600Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7692680Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7692718Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7692870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7692928Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7693094Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7693147Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7693190Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7693331Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7693383Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7693420Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7694486Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7694552Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7694596Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7694725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7694788Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7694828Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7694969Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7695012Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7695067Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7695204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7695243Z return aot_autograd( 2025-09-07T07:34:42.7695278Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7695414Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7695483Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7695529Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7695689Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7695773Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7695819Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7696001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7696044Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7696229Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7697293Z fx_g = _create_graph( 2025-09-07T07:34:42.7697331Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7697496Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7697530Z fx_g = make_fx( 2025-09-07T07:34:42.7697563Z ^^^^^^^^ 2025-09-07T07:34:42.7697742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7697789Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7697828Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7697974Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7698016Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7698053Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7698211Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7698266Z t = dispatch_trace( 2025-09-07T07:34:42.7698316Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7698430Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7698471Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7698508Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7698632Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7698675Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7698710Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7698872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7698951Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7699958Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7700085Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7700125Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7700159Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7700283Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7700327Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7700360Z ^^^^^^^^^ 2025-09-07T07:34:42.7700517Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7700557Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7700592Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7700740Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7700789Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7700822Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7700981Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7701042Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7701087Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7701261Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7701302Z outs_pair = fn(*args) 2025-09-07T07:34:42.7701336Z ^^^^^^^^^ 2025-09-07T07:34:42.7701509Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7701574Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7702572Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7702747Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7702787Z outs_pair = fn(*args) 2025-09-07T07:34:42.7702820Z ^^^^^^^^^ 2025-09-07T07:34:42.7703021Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7703081Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7703127Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7703324Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7703394Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7703439Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7703630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7703680Z outs_pair = fn(*args) 2025-09-07T07:34:42.7703713Z ^^^^^^^^^ 2025-09-07T07:34:42.7703904Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7703949Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7703986Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7704155Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7704201Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7704237Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7704363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7704407Z return handle_torch_function( 2025-09-07T07:34:42.7704447Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7705540Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7705617Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7705662Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7705829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7705887Z return func(*args, **kwargs) 2025-09-07T07:34:42.7705923Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7706046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7706088Z result = _engine_run_backward( 2025-09-07T07:34:42.7706123Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7706271Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7706393Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7706446Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7706632Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7706676Z return user_fn(self, *args) 2025-09-07T07:34:42.7706712Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7706856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7706899Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7706936Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7707093Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7707140Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7707175Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7708263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7708326Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7708361Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7708530Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7708584Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7708623Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7708758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7708808Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7708865Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7709046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7709093Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7709132Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7709291Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7709332Z t = dispatch_trace( 2025-09-07T07:34:42.7709365Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7709479Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7709521Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7709558Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7709681Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7709721Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7709756Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7709916Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7710951Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7710993Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7711117Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7711180Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7711214Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7711340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7711380Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7711414Z ^^^^^^^^^ 2025-09-07T07:34:42.7711567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7711618Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7711652Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7711694Z File "", line 1, in 2025-09-07T07:34:42.7711838Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7711915Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7711964Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7712101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7712149Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7712186Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7712379Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7712423Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7712459Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7712643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7713639Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7713678Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7713821Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7713863Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7713899Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7714033Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7714120Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7714195Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7714324Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7714385Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7714428Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7714557Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7714596Z leaves = list(leaves) 2025-09-07T07:34:42.7714630Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7714754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7714789Z return func(x) 2025-09-07T07:34:42.7714822Z ^^^^^^^ 2025-09-07T07:34:42.7714961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7715027Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7715068Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7715236Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7716225Z return func(*args, **kwargs) 2025-09-07T07:34:42.7716281Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7716462Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7716607Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7716609Z 2025-09-07T07:34:42.7716816Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7716820Z 2025-09-07T07:34:42.7716822Z 2025-09-07T07:34:42.7716895Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7717104Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.7717108Z 2025-09-07T07:34:42.7717193Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7717269Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7717304Z inline_call [] 2025-09-07T07:34:42.7717361Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7717395Z inductor [] 2025-09-07T07:34:42.7717470Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7717540Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7717800Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7717914Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7717987Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7718137Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7718223Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7718352Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7718472Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7718541Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7719573Z inline_call [] 2025-09-07T07:34:42.7719648Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7719722Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7719791Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7720047Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7720207Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7720259Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7720408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7720492Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7720624Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7720744Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7720813Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7720849Z inline_call [] 2025-09-07T07:34:42.7720903Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-09-07T07:34:42.7721001Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7721069Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7721322Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7721433Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7721486Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7721633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7721716Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7722805Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7722926Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7723140Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-003e04e678f1dede.xml - 2025-09-07T07:34:42.7723198Z =========================== short test summary info ============================ 2025-09-07T07:34:42.7723575Z FAILED [0.2593s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7723661Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7723663Z 2025-09-07T07:34:42.7723888Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7723893Z 2025-09-07T07:34:42.7723896Z 2025-09-07T07:34:42.7723968Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7724174Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True 2025-09-07T07:34:42.7724176Z 2025-09-07T07:34:42.7724260Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7724348Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.7724414Z ================== 1 failed, 245 deselected, 2 rerun in 1.24s ================== 2025-09-07T07:34:42.7724450Z Got exit code 1 2025-09-07T07:34:42.7724574Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-09-07T07:34:42.7724995Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.7725036Z import pkg_resources 2025-09-07T07:34:42.7725204Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-381b59b9b2b38d84.xml 2025-09-07T07:34:42.7725260Z ============================= test session starts ============================== 2025-09-07T07:34:42.7725373Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.7725412Z cachedir: .pytest_cache 2025-09-07T07:34:42.7725569Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.7726643Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.7726712Z configfile: pytest.ini 2025-09-07T07:34:42.7726874Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.7726948Z collecting ... collected 467 items / 80 deselected / 387 selected 2025-09-07T07:34:42.7727000Z stepcurrent: skipping 80 already run items. 2025-09-07T07:34:42.7727041Z Running 166 items in this shard 2025-09-07T07:34:42.7727044Z 2025-09-07T07:34:42.7727258Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.9261s] [ 0%] 2025-09-07T07:34:42.7727464Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.7021s] [ 0%] 2025-09-07T07:34:42.7727648Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True FAILED [0.7059s] [ 0%] 2025-09-07T07:34:42.7727652Z 2025-09-07T07:34:42.7727701Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.7727822Z _ WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.7727864Z Traceback (most recent call last): 2025-09-07T07:34:42.7728028Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1441, in test_while_loop_with_unbacked_symint_closure 2025-09-07T07:34:42.7728063Z self._run_test( 2025-09-07T07:34:42.7728177Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7728232Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7728272Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7728423Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7728471Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7728511Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7728663Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7728709Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7729720Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7729857Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7729939Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7729977Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7730121Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7730204Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7730242Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7730396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7730443Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7730592Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7730646Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7730685Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7730829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7730880Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7730918Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7731034Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7731102Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7731165Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7731291Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7731355Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7731396Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7732492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7732540Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7732578Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7732716Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7732756Z return aot_autograd( 2025-09-07T07:34:42.7732791Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7732927Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7732999Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7733044Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7733205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7733287Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7733333Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7733514Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7733573Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7733760Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7733802Z fx_g = _create_graph( 2025-09-07T07:34:42.7733838Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7734001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7734036Z fx_g = make_fx( 2025-09-07T07:34:42.7734068Z ^^^^^^^^ 2025-09-07T07:34:42.7734220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7734292Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7735283Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7735431Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7735476Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7735512Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7735670Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7735708Z t = dispatch_trace( 2025-09-07T07:34:42.7735742Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7735856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7735896Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7735932Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7736058Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7736099Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7736134Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7736296Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7736375Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7736433Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7736635Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7736675Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7736708Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7736834Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7736875Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7737875Z ^^^^^^^^^ 2025-09-07T07:34:42.7738009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7738050Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7738086Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7738236Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7738287Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7738321Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7738479Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7738540Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7738585Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7738761Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7738801Z outs_pair = fn(*args) 2025-09-07T07:34:42.7738836Z ^^^^^^^^^ 2025-09-07T07:34:42.7739034Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7739101Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7739147Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7739321Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7739360Z outs_pair = fn(*args) 2025-09-07T07:34:42.7739394Z ^^^^^^^^^ 2025-09-07T07:34:42.7739571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7739668Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7740954Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7741152Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7741224Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7741270Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7741443Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7741481Z outs_pair = fn(*args) 2025-09-07T07:34:42.7741516Z ^^^^^^^^^ 2025-09-07T07:34:42.7741705Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7741751Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7741788Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7741957Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7742003Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7742040Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7742166Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7742235Z return handle_torch_function( 2025-09-07T07:34:42.7742270Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7742413Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7742487Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7742531Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7742703Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7742743Z return func(*args, **kwargs) 2025-09-07T07:34:42.7742779Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7743884Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7743927Z result = _engine_run_backward( 2025-09-07T07:34:42.7743964Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7744111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7744230Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7744281Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7744408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7744450Z return user_fn(self, *args) 2025-09-07T07:34:42.7744486Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7744648Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7744691Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7744728Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7744886Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7744930Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7744966Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7745089Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7745128Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7745177Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7745354Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7745407Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7745448Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7746623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7746675Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7746714Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7746876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7746923Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7746961Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7747121Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7747159Z t = dispatch_trace( 2025-09-07T07:34:42.7747193Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7747306Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7747350Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7747386Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7747542Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7747582Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7747617Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7747777Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7747855Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7747898Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7748022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7748060Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7748094Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7748222Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7749237Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7749275Z ^^^^^^^^^ 2025-09-07T07:34:42.7749424Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7749472Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7749505Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7749547Z File "", line 1, in 2025-09-07T07:34:42.7749690Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7749770Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7749815Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7749972Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7750020Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7750059Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7750251Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7750294Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7750330Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7750500Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7750561Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7750614Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7750759Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7750801Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7750838Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7750971Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7752027Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7752074Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7752200Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7752259Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7752303Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7752428Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7752467Z leaves = list(leaves) 2025-09-07T07:34:42.7752501Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7752625Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7752659Z return func(x) 2025-09-07T07:34:42.7752711Z ^^^^^^^ 2025-09-07T07:34:42.7752847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7752913Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7752954Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7753122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7753166Z return func(*args, **kwargs) 2025-09-07T07:34:42.7753202Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7753384Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7753471Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7753475Z 2025-09-07T07:34:42.7753679Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7753683Z 2025-09-07T07:34:42.7753684Z 2025-09-07T07:34:42.7753756Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7754920Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7754924Z 2025-09-07T07:34:42.7755012Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7755086Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7755123Z inline_call [] 2025-09-07T07:34:42.7755179Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7755229Z inductor [] 2025-09-07T07:34:42.7755304Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7755378Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7755636Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7755752Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7755802Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7755979Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7756065Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7756197Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7756316Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7756439Z _ WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.7756539Z Traceback (most recent call last): 2025-09-07T07:34:42.7756701Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1441, in test_while_loop_with_unbacked_symint_closure 2025-09-07T07:34:42.7756737Z self._run_test( 2025-09-07T07:34:42.7756849Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7756906Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7756947Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7758067Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7758114Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7758153Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7758305Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7758380Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7758418Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7758557Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7758602Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7758640Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7758784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7758866Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7758903Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7759056Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7759103Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7759253Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7759305Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7759345Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7759486Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7759538Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7759577Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7759693Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7760792Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7760839Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7760967Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7761030Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7761071Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7761211Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7761254Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7761315Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7761471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7761510Z return aot_autograd( 2025-09-07T07:34:42.7761546Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7761682Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7761751Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7761799Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7761960Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7762041Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7762087Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7762270Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7762314Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7762501Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7762540Z fx_g = _create_graph( 2025-09-07T07:34:42.7762575Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7763722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7763757Z fx_g = make_fx( 2025-09-07T07:34:42.7763790Z ^^^^^^^^ 2025-09-07T07:34:42.7763940Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7763987Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7764026Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7764174Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7764216Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7764253Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7764412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7764453Z t = dispatch_trace( 2025-09-07T07:34:42.7764486Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7764599Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7764640Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7764675Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7764799Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7764841Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7764879Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7765041Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7765119Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7765176Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7766260Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7766300Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7766335Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7766459Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7766574Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7766609Z ^^^^^^^^^ 2025-09-07T07:34:42.7766741Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7766822Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7766858Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7767006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7767057Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7767090Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7767248Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7767311Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7767355Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7767531Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7767573Z outs_pair = fn(*args) 2025-09-07T07:34:42.7767607Z ^^^^^^^^^ 2025-09-07T07:34:42.7767778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7767845Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7767890Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7768063Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7769096Z outs_pair = fn(*args) 2025-09-07T07:34:42.7769131Z ^^^^^^^^^ 2025-09-07T07:34:42.7769309Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7769369Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7769411Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7769611Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7769681Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7769729Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7769901Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7769943Z outs_pair = fn(*args) 2025-09-07T07:34:42.7769976Z ^^^^^^^^^ 2025-09-07T07:34:42.7770167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7770211Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7770247Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7770420Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7770466Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7770502Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7770650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7770693Z return handle_torch_function( 2025-09-07T07:34:42.7770731Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7770872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7771908Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7771953Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7772120Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7772195Z return func(*args, **kwargs) 2025-09-07T07:34:42.7772231Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7772355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7772398Z result = _engine_run_backward( 2025-09-07T07:34:42.7772433Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7772579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7772699Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7772749Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7772874Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7772917Z return user_fn(self, *args) 2025-09-07T07:34:42.7772952Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7773096Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7773139Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7773177Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7773337Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7773399Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7773436Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7773560Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7773599Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7774588Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7774755Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7774809Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7774850Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7774986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7775036Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7775074Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7775239Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7775285Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7775324Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7775481Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7775520Z t = dispatch_trace( 2025-09-07T07:34:42.7775554Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7775668Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7775710Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7775746Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7775885Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7775925Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7775961Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7776121Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7776200Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7776240Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7777394Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7777475Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7777511Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7777637Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7777680Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7777714Z ^^^^^^^^^ 2025-09-07T07:34:42.7777863Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7777913Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7777946Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7777987Z File "", line 1, in 2025-09-07T07:34:42.7778130Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7778207Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7778253Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7778390Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7778437Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7778475Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7778667Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7778730Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7778766Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7778935Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7778979Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7779991Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7780138Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7780180Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7780215Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7780351Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7780438Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7780487Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7780611Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7780671Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7780714Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7780839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7780882Z leaves = list(leaves) 2025-09-07T07:34:42.7780917Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7781039Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7781074Z return func(x) 2025-09-07T07:34:42.7781137Z ^^^^^^^ 2025-09-07T07:34:42.7781275Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7781342Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7781383Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7781550Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7781591Z return func(*args, **kwargs) 2025-09-07T07:34:42.7781625Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7782797Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7782883Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7782886Z 2025-09-07T07:34:42.7783093Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7783096Z 2025-09-07T07:34:42.7783098Z 2025-09-07T07:34:42.7783171Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7783380Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7783382Z 2025-09-07T07:34:42.7783467Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7783541Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7783578Z inline_call [] 2025-09-07T07:34:42.7783635Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7783669Z inductor [] 2025-09-07T07:34:42.7783742Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7783815Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7784073Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7784204Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7784256Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7784410Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7784499Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7784630Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7784749Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7784819Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7784855Z inline_call [] 2025-09-07T07:34:42.7785877Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7785952Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7786022Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7786276Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7786392Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7786442Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7786674Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7786781Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7786914Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7787033Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7787083Z =================================== FAILURES =================================== 2025-09-07T07:34:42.7787205Z _ WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.7787248Z Traceback (most recent call last): 2025-09-07T07:34:42.7787445Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1441, in test_while_loop_with_unbacked_symint_closure 2025-09-07T07:34:42.7787481Z self._run_test( 2025-09-07T07:34:42.7787592Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7787649Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7787688Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7787822Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7787868Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7787908Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7788058Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7789086Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7789126Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7789265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7789308Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7789346Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7789492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7789597Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7789635Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7789787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7789832Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7789985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7790040Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7790081Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7790223Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7790275Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7790313Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7790432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7790497Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7790541Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7790666Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7790728Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7791737Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7791880Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7791923Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7791977Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7792114Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7792156Z return aot_autograd( 2025-09-07T07:34:42.7792192Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7792328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7792397Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7792443Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7792625Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7792722Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7792767Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7792950Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7792994Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7793179Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7793219Z fx_g = _create_graph( 2025-09-07T07:34:42.7793254Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7793417Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7793452Z fx_g = make_fx( 2025-09-07T07:34:42.7793486Z ^^^^^^^^ 2025-09-07T07:34:42.7793637Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7794642Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7794682Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7794829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7794889Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7794926Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7795084Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7795123Z t = dispatch_trace( 2025-09-07T07:34:42.7795157Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7795270Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7795314Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7795349Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7795475Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7795516Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7795552Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7795713Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7795795Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7795835Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7795961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7796000Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7796035Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7796161Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7797250Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7797286Z ^^^^^^^^^ 2025-09-07T07:34:42.7797446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7797487Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7797525Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7797674Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7797723Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7797757Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7797913Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7797973Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7798060Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7798236Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7798276Z outs_pair = fn(*args) 2025-09-07T07:34:42.7798312Z ^^^^^^^^^ 2025-09-07T07:34:42.7798487Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7798554Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7798598Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7798772Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7798810Z outs_pair = fn(*args) 2025-09-07T07:34:42.7798846Z ^^^^^^^^^ 2025-09-07T07:34:42.7799022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7799082Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7800096Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7800337Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7800437Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7800483Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7800654Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7800693Z outs_pair = fn(*args) 2025-09-07T07:34:42.7800729Z ^^^^^^^^^ 2025-09-07T07:34:42.7800921Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7800966Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7801003Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7801173Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7801221Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7801257Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7801384Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7801427Z return handle_torch_function( 2025-09-07T07:34:42.7801463Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7801605Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7801684Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7801729Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7801911Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7801952Z return func(*args, **kwargs) 2025-09-07T07:34:42.7802955Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7803082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7803124Z result = _engine_run_backward( 2025-09-07T07:34:42.7803159Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7803306Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7803426Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7803507Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7803633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7803675Z return user_fn(self, *args) 2025-09-07T07:34:42.7803713Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7803857Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7803905Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7803941Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7804098Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7804141Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7804178Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7804301Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7804341Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7804376Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7804543Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7804594Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7804651Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7805745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7805796Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7805834Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7805995Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7806043Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7806083Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7806241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7806279Z t = dispatch_trace( 2025-09-07T07:34:42.7806314Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7806426Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7806470Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7806573Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7806696Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7806734Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7806770Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7806928Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7807011Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7807052Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7807201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7807241Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7807276Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7807402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7808419Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7808453Z ^^^^^^^^^ 2025-09-07T07:34:42.7808605Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7808653Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7808709Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7808771Z File "", line 1, in 2025-09-07T07:34:42.7808915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7808992Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7809039Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7809176Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7809224Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7809261Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7809455Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7809497Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7809534Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7809705Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7809749Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7809785Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7809928Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7809990Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7810025Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7811117Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7811207Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7811252Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7811382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7811443Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7811486Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7811612Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7811651Z leaves = list(leaves) 2025-09-07T07:34:42.7811685Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7811810Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7811845Z return func(x) 2025-09-07T07:34:42.7811877Z ^^^^^^^ 2025-09-07T07:34:42.7812015Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7812079Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7812121Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7812288Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7812331Z return func(*args, **kwargs) 2025-09-07T07:34:42.7812366Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7812561Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7812647Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7812649Z 2025-09-07T07:34:42.7812856Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7812859Z 2025-09-07T07:34:42.7812861Z 2025-09-07T07:34:42.7812932Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7814115Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7814132Z 2025-09-07T07:34:42.7814217Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7814293Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7814328Z inline_call [] 2025-09-07T07:34:42.7814385Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7814420Z inductor [] 2025-09-07T07:34:42.7814494Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7814565Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7814822Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7814939Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7814990Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7815141Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7815230Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7815360Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7815499Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7815569Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7815604Z inline_call [] 2025-09-07T07:34:42.7815659Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7815732Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7815803Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7816058Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7817212Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7817265Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7817416Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7817501Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7817630Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7817748Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7817822Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7817857Z inline_call [] 2025-09-07T07:34:42.7817911Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7818005Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7818075Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7818328Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7818440Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7818490Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7818638Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7818763Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7818893Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7819012Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7819229Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-381b59b9b2b38d84.xml - 2025-09-07T07:34:42.7819287Z =========================== short test summary info ============================ 2025-09-07T07:34:42.7819660Z FAILED [0.7059s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7820721Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7820724Z 2025-09-07T07:34:42.7820933Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7820935Z 2025-09-07T07:34:42.7820938Z 2025-09-07T07:34:42.7821011Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7821237Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7821240Z 2025-09-07T07:34:42.7821324Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7821383Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.7821448Z ================== 1 failed, 80 deselected, 2 rerun in 2.63s =================== 2025-09-07T07:34:42.7821486Z Got exit code 1 2025-09-07T07:34:42.7821526Z Retrying single test... 2025-09-07T07:34:42.7821948Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.7821990Z import pkg_resources 2025-09-07T07:34:42.7822159Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-c1c4c05ef46d9851.xml 2025-09-07T07:34:42.7822215Z ============================= test session starts ============================== 2025-09-07T07:34:42.7822328Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.7822368Z cachedir: .pytest_cache 2025-09-07T07:34:42.7822526Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.7822573Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.7822612Z configfile: pytest.ini 2025-09-07T07:34:42.7822785Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.7822861Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.7823108Z stepcurrent: skipping 80 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7823149Z Running 1 items in this shard 2025-09-07T07:34:42.7823151Z 2025-09-07T07:34:42.7824325Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.9227s] [100%] 2025-09-07T07:34:42.7824560Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.7650s] [100%] 2025-09-07T07:34:42.7824744Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True FAILED [0.7594s] [100%] 2025-09-07T07:34:42.7824747Z 2025-09-07T07:34:42.7824796Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.7824918Z _ WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.7824961Z Traceback (most recent call last): 2025-09-07T07:34:42.7825125Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1441, in test_while_loop_with_unbacked_symint_closure 2025-09-07T07:34:42.7825160Z self._run_test( 2025-09-07T07:34:42.7825272Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7825328Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7825369Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7825503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7825550Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7825589Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7825740Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7825802Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7825842Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7825978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7826022Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7826059Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7826204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7827333Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7827373Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7827527Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7827574Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7827725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7827779Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7827819Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7827961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7828014Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7828052Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7828170Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7828266Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7828311Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7828438Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7828502Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7828542Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7828683Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7828726Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7828782Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7828937Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7828977Z return aot_autograd( 2025-09-07T07:34:42.7829011Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7830119Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7830190Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7830237Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7830398Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7830481Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7830526Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7830711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7830754Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7830941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7830981Z fx_g = _create_graph( 2025-09-07T07:34:42.7831016Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7831202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7831237Z fx_g = make_fx( 2025-09-07T07:34:42.7831269Z ^^^^^^^^ 2025-09-07T07:34:42.7831421Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7831468Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7831507Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7831655Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7831697Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7831734Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7831894Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7832892Z t = dispatch_trace( 2025-09-07T07:34:42.7832927Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7833041Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7833082Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7833118Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7833245Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7833287Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7833323Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7833486Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7833565Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7833623Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7833747Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7833788Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7833822Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7833949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7833990Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7834026Z ^^^^^^^^^ 2025-09-07T07:34:42.7834159Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7834227Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7834263Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7834411Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7834462Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7835453Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7835612Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7835673Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7835718Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7835892Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7835933Z outs_pair = fn(*args) 2025-09-07T07:34:42.7835968Z ^^^^^^^^^ 2025-09-07T07:34:42.7836142Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7836208Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7836254Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7836427Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7836559Z outs_pair = fn(*args) 2025-09-07T07:34:42.7836594Z ^^^^^^^^^ 2025-09-07T07:34:42.7836772Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7836831Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7836874Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7837071Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7837143Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7837188Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7837360Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7837400Z outs_pair = fn(*args) 2025-09-07T07:34:42.7838403Z ^^^^^^^^^ 2025-09-07T07:34:42.7838594Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7838639Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7838675Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7838846Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7838890Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7838927Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7839080Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7839125Z return handle_torch_function( 2025-09-07T07:34:42.7839164Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7839307Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7839381Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7839425Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7839594Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7839674Z return func(*args, **kwargs) 2025-09-07T07:34:42.7839711Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7839834Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7839877Z result = _engine_run_backward( 2025-09-07T07:34:42.7839913Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7840059Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7840223Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7840273Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7841370Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7841414Z return user_fn(self, *args) 2025-09-07T07:34:42.7841452Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7841597Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7841640Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7841677Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7841834Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7841904Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7841939Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7842062Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7842101Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7842136Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7842302Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7842357Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7842396Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7842532Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7842583Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7842621Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7842784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7842831Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7842870Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7843028Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7844012Z t = dispatch_trace( 2025-09-07T07:34:42.7844047Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7844161Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7844203Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7844239Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7844377Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7844419Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7844454Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7844613Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7844691Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7844732Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7844855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7844920Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7844955Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7845083Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7845124Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7845159Z ^^^^^^^^^ 2025-09-07T07:34:42.7845310Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7845359Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7845393Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7845434Z File "", line 1, in 2025-09-07T07:34:42.7846598Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7846679Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7846726Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7846862Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7846910Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7846950Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7847142Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7847213Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7847249Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7847419Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7847464Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7847500Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7847646Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7847688Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7847724Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7847858Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7847945Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7847992Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7848119Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7848178Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7848222Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7848346Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7849354Z leaves = list(leaves) 2025-09-07T07:34:42.7849389Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7849513Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7849547Z return func(x) 2025-09-07T07:34:42.7849601Z ^^^^^^^ 2025-09-07T07:34:42.7849739Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7849806Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7849848Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7850014Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7850055Z return func(*args, **kwargs) 2025-09-07T07:34:42.7850090Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7850311Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7850396Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7850399Z 2025-09-07T07:34:42.7850605Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7850608Z 2025-09-07T07:34:42.7850610Z 2025-09-07T07:34:42.7850683Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7850889Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7850891Z 2025-09-07T07:34:42.7850977Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7851051Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7851087Z inline_call [] 2025-09-07T07:34:42.7851144Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7851178Z inductor [] 2025-09-07T07:34:42.7852221Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7852296Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7852553Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7852687Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7852738Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7852890Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7852979Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7853110Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7853228Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7853349Z _ WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.7853395Z Traceback (most recent call last): 2025-09-07T07:34:42.7853554Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1441, in test_while_loop_with_unbacked_symint_closure 2025-09-07T07:34:42.7853589Z self._run_test( 2025-09-07T07:34:42.7853700Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7853755Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7853794Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7853929Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7853974Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7854013Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7854177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7854225Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7854265Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7855358Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7855402Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7855440Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7855583Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7855695Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7855734Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7855888Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7855933Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7856083Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7856138Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7856180Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7856320Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7856371Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7856409Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7856595Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7856661Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7856705Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7856832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7856895Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7856961Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7857099Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7858115Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7858153Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7858293Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7858336Z return aot_autograd( 2025-09-07T07:34:42.7858372Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7858506Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7858577Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7858622Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7858788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7858869Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7858914Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7859096Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7859139Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7859325Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7859365Z fx_g = _create_graph( 2025-09-07T07:34:42.7859425Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7859590Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7859626Z fx_g = make_fx( 2025-09-07T07:34:42.7859659Z ^^^^^^^^ 2025-09-07T07:34:42.7859810Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7859856Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7859893Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7860997Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7861079Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7861116Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7861275Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7861314Z t = dispatch_trace( 2025-09-07T07:34:42.7861348Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7861461Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7861504Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7861539Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7861663Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7861702Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7861738Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7861899Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7861978Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7862018Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7862143Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7862181Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7862232Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7862358Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7862400Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7862434Z ^^^^^^^^^ 2025-09-07T07:34:42.7863522Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7863562Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7863601Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7863750Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7863799Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7863832Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7863990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7864054Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7864098Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7864276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7864315Z outs_pair = fn(*args) 2025-09-07T07:34:42.7864349Z ^^^^^^^^^ 2025-09-07T07:34:42.7864522Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7864590Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7864634Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7864823Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7864862Z outs_pair = fn(*args) 2025-09-07T07:34:42.7864898Z ^^^^^^^^^ 2025-09-07T07:34:42.7865074Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7865134Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7865176Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7865369Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7866422Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7866469Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7866755Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7866795Z outs_pair = fn(*args) 2025-09-07T07:34:42.7866830Z ^^^^^^^^^ 2025-09-07T07:34:42.7867019Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7867064Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7867101Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7867269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7867319Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7867354Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7867481Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7867523Z return handle_torch_function( 2025-09-07T07:34:42.7867561Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7867702Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7867802Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7867847Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7868014Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7868053Z return func(*args, **kwargs) 2025-09-07T07:34:42.7868090Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7868216Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7869247Z result = _engine_run_backward( 2025-09-07T07:34:42.7869283Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7869431Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7869552Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7869604Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7869732Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7869772Z return user_fn(self, *args) 2025-09-07T07:34:42.7869809Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7869953Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7869998Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7870033Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7870214Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7870260Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7870296Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7870423Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7870464Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7870498Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7870664Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7870715Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7870789Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7870925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7870975Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7871977Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7872140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7872188Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7872227Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7872386Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7872424Z t = dispatch_trace( 2025-09-07T07:34:42.7872458Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7872570Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7872615Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7872651Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7872774Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7872813Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7872849Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7873010Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7873107Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7873147Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7873272Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7873309Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7873345Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7873471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7873513Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7873547Z ^^^^^^^^^ 2025-09-07T07:34:42.7874659Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7874709Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7874745Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7874786Z File "", line 1, in 2025-09-07T07:34:42.7874930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7875006Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7875051Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7875191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7875239Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7875276Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7875484Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7875527Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7875565Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7875736Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7875780Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7875818Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7875960Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7876016Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7876065Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7876201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7876288Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7877363Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7877491Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7877551Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7877593Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7877721Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7877758Z leaves = list(leaves) 2025-09-07T07:34:42.7877794Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7877917Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7877953Z return func(x) 2025-09-07T07:34:42.7877984Z ^^^^^^^ 2025-09-07T07:34:42.7878124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7878187Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7878259Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7878427Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7878468Z return func(*args, **kwargs) 2025-09-07T07:34:42.7878503Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7878683Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7878770Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7878773Z 2025-09-07T07:34:42.7878980Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7878984Z 2025-09-07T07:34:42.7878986Z 2025-09-07T07:34:42.7879057Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7879265Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7879267Z 2025-09-07T07:34:42.7879351Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7880429Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7880465Z inline_call [] 2025-09-07T07:34:42.7880523Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7880557Z inductor [] 2025-09-07T07:34:42.7880632Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7880703Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7880985Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7881102Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7881154Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7881305Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7881391Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7881540Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7881677Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7881748Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7881783Z inline_call [] 2025-09-07T07:34:42.7881838Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7881911Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7881982Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7882236Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7882349Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7882399Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7883524Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7883611Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7883741Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7883861Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7883930Z =================================== FAILURES =================================== 2025-09-07T07:34:42.7884050Z _ WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.7884095Z Traceback (most recent call last): 2025-09-07T07:34:42.7884257Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1441, in test_while_loop_with_unbacked_symint_closure 2025-09-07T07:34:42.7884294Z self._run_test( 2025-09-07T07:34:42.7884406Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7884461Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7884501Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7884635Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7884683Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7884723Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7884874Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7884920Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7884958Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7885095Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7885139Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7885179Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7885322Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7886399Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7886439Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7886688Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7886733Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7886883Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7886935Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7886996Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7887156Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7887208Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7887246Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7887364Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7887430Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7887473Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7887599Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7887662Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7887704Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7887844Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7887889Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7887926Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7888064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7888102Z return aot_autograd( 2025-09-07T07:34:42.7889135Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7889301Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7889371Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7889416Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7889577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7889661Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7889707Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7889889Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7889934Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7890119Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7890161Z fx_g = _create_graph( 2025-09-07T07:34:42.7890196Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7890359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7890393Z fx_g = make_fx( 2025-09-07T07:34:42.7890426Z ^^^^^^^^ 2025-09-07T07:34:42.7890578Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7890624Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7890663Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7890826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7890868Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7890904Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7891064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7892076Z t = dispatch_trace( 2025-09-07T07:34:42.7892111Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7892224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7892266Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7892316Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7892458Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7892498Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7892534Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7892697Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7892775Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7892819Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7892942Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7892980Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7893015Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7893140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7893183Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7893218Z ^^^^^^^^^ 2025-09-07T07:34:42.7893350Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7893390Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7893428Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7893574Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7894624Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7894659Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7894816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7894877Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7894921Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7895100Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7895140Z outs_pair = fn(*args) 2025-09-07T07:34:42.7895174Z ^^^^^^^^^ 2025-09-07T07:34:42.7895347Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7895414Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7895460Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7895636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7895674Z outs_pair = fn(*args) 2025-09-07T07:34:42.7895709Z ^^^^^^^^^ 2025-09-07T07:34:42.7895884Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7895948Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7895990Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7896204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7896274Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7896323Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7896560Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7896599Z outs_pair = fn(*args) 2025-09-07T07:34:42.7897626Z ^^^^^^^^^ 2025-09-07T07:34:42.7897818Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7897914Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7897952Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7898121Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7898168Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7898205Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7898331Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7898373Z return handle_torch_function( 2025-09-07T07:34:42.7898409Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7898553Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7898627Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7898673Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7898842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7898882Z return func(*args, **kwargs) 2025-09-07T07:34:42.7898917Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7899042Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7899105Z result = _engine_run_backward( 2025-09-07T07:34:42.7899141Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7899288Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7899409Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7899459Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7900590Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7900631Z return user_fn(self, *args) 2025-09-07T07:34:42.7900668Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7900813Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7900857Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7900896Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7901054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7901097Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7901135Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7901258Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7901299Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7901336Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7901503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7901554Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7901626Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7901764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7901816Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7901854Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7902020Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7902066Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7902105Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7903273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7903313Z t = dispatch_trace( 2025-09-07T07:34:42.7903346Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7903461Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7903503Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7903539Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7903664Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7903702Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7903737Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7903898Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7903976Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7904018Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7904142Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7904179Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7904214Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7904342Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7904401Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7904435Z ^^^^^^^^^ 2025-09-07T07:34:42.7904587Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7904635Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7904670Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7904711Z File "", line 1, in 2025-09-07T07:34:42.7905839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7905918Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7905964Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7906100Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7906148Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7906188Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7906380Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7906423Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7906458Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7906716Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7906763Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7906801Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7906943Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7907009Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7907045Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7907181Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7907269Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7907315Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7907440Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7907500Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7907579Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7908689Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7908729Z leaves = list(leaves) 2025-09-07T07:34:42.7908764Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7908887Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7908923Z return func(x) 2025-09-07T07:34:42.7908955Z ^^^^^^^ 2025-09-07T07:34:42.7909093Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7909157Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7909199Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7909365Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7909409Z return func(*args, **kwargs) 2025-09-07T07:34:42.7909443Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7909625Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7909712Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7909714Z 2025-09-07T07:34:42.7909946Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7909949Z 2025-09-07T07:34:42.7909950Z 2025-09-07T07:34:42.7910023Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7910230Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7910234Z 2025-09-07T07:34:42.7910320Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7910395Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7910429Z inline_call [] 2025-09-07T07:34:42.7910488Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7910522Z inductor [] 2025-09-07T07:34:42.7911569Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7911645Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7911904Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7912019Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7912071Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7912223Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7912309Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7912458Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7912578Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7912650Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7912685Z inline_call [] 2025-09-07T07:34:42.7912741Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7912813Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7912883Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7913165Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7913279Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7913330Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7913480Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7913564Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7913694Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7913812Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7914857Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7914894Z inline_call [] 2025-09-07T07:34:42.7914951Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7915022Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7915091Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7915346Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7915479Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7915528Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7915677Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7915760Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7915891Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7916008Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7916226Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-c1c4c05ef46d9851.xml - 2025-09-07T07:34:42.7916284Z =========================== short test summary info ============================ 2025-09-07T07:34:42.7916740Z FAILED [0.7594s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7916824Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7916826Z 2025-09-07T07:34:42.7917037Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7917039Z 2025-09-07T07:34:42.7917043Z 2025-09-07T07:34:42.7917115Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7917341Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7917345Z 2025-09-07T07:34:42.7917430Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7917489Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.7918549Z ================== 1 failed, 245 deselected, 2 rerun in 2.62s ================== 2025-09-07T07:34:42.7918585Z Got exit code 1 2025-09-07T07:34:42.7918624Z Retrying single test... 2025-09-07T07:34:42.7919096Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.7919137Z import pkg_resources 2025-09-07T07:34:42.7919306Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-f5a4b9704255c2ad.xml 2025-09-07T07:34:42.7919365Z ============================= test session starts ============================== 2025-09-07T07:34:42.7919478Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.7919517Z cachedir: .pytest_cache 2025-09-07T07:34:42.7919673Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.7919719Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.7919757Z configfile: pytest.ini 2025-09-07T07:34:42.7919920Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.7919997Z collecting ... collected 467 items / 245 deselected / 222 selected 2025-09-07T07:34:42.7920289Z stepcurrent: skipping 80 already run items. Running only test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7920351Z Running 1 items in this shard 2025-09-07T07:34:42.7920353Z 2025-09-07T07:34:42.7920562Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [1.0406s] [100%] 2025-09-07T07:34:42.7920768Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True ('RERUN', {'yellow': True}) [0.7458s] [100%] 2025-09-07T07:34:42.7920956Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True FAILED [0.7628s] [100%] 2025-09-07T07:34:42.7920959Z 2025-09-07T07:34:42.7921006Z ==================================== RERUNS ==================================== 2025-09-07T07:34:42.7921129Z _ WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.7921175Z Traceback (most recent call last): 2025-09-07T07:34:42.7922335Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1441, in test_while_loop_with_unbacked_symint_closure 2025-09-07T07:34:42.7922371Z self._run_test( 2025-09-07T07:34:42.7922485Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7922540Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7922581Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7922720Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7922767Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7922805Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7922973Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7923022Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7923061Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7923200Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7923243Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7923281Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7923427Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7923539Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7923578Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7923731Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7923777Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7923928Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7923982Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7924997Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7925139Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7925190Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7925228Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7925351Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7925417Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7925462Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7925588Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7925651Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7925714Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7925854Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7925898Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7925936Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7926072Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7926114Z return aot_autograd( 2025-09-07T07:34:42.7926148Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7926285Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7926356Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7926401Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7926642Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7926725Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7926771Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7927932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7927979Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7928165Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7928204Z fx_g = _create_graph( 2025-09-07T07:34:42.7928261Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7928426Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7928462Z fx_g = make_fx( 2025-09-07T07:34:42.7928495Z ^^^^^^^^ 2025-09-07T07:34:42.7928646Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7928692Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7928729Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7928877Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7928954Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7928991Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7929152Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7929191Z t = dispatch_trace( 2025-09-07T07:34:42.7929225Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7929338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7929381Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7929417Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7929542Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7930558Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7930595Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7930760Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7930839Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7930881Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7931006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7931045Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7931105Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7931230Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7931272Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7931306Z ^^^^^^^^^ 2025-09-07T07:34:42.7931439Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7931478Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7931516Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7931665Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7931715Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7931748Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7931907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7931971Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7932015Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7932189Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7932229Z outs_pair = fn(*args) 2025-09-07T07:34:42.7933231Z ^^^^^^^^^ 2025-09-07T07:34:42.7933406Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7933474Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7933518Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7933711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7933751Z outs_pair = fn(*args) 2025-09-07T07:34:42.7933788Z ^^^^^^^^^ 2025-09-07T07:34:42.7933965Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7934024Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7934066Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7934260Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7934357Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7934404Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7934580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7934617Z outs_pair = fn(*args) 2025-09-07T07:34:42.7934652Z ^^^^^^^^^ 2025-09-07T07:34:42.7934842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7934886Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7934923Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7935091Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7935139Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7935175Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7936264Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7936306Z return handle_torch_function( 2025-09-07T07:34:42.7936346Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7936550Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7936660Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7936705Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7936874Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7936914Z return func(*args, **kwargs) 2025-09-07T07:34:42.7936953Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7937077Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7937119Z result = _engine_run_backward( 2025-09-07T07:34:42.7937154Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7937303Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7937423Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7937474Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7937600Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7937642Z return user_fn(self, *args) 2025-09-07T07:34:42.7937678Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7937822Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7937866Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7937902Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7939068Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7939115Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7939151Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7939279Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7939317Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7939352Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7939519Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7939571Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7939647Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7939787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7939836Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7939875Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7940037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7940085Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7940124Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7940284Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7940322Z t = dispatch_trace( 2025-09-07T07:34:42.7940357Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7940471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7940515Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7940551Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7940674Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7941687Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7941723Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7941884Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7941981Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7942021Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7942143Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7942182Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7942218Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7942344Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7942386Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7942420Z ^^^^^^^^^ 2025-09-07T07:34:42.7942572Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7942619Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7942656Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7942697Z File "", line 1, in 2025-09-07T07:34:42.7942839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7942916Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7942961Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7943099Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7943147Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7943185Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7944353Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7944397Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7944435Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7944606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7944650Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7944686Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7944829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7944884Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7944934Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7945068Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7945157Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7945201Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7945331Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7945390Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7945433Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7945558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7945596Z leaves = list(leaves) 2025-09-07T07:34:42.7945634Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7945756Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7945792Z return func(x) 2025-09-07T07:34:42.7945824Z ^^^^^^^ 2025-09-07T07:34:42.7945963Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7947075Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7947145Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7947312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7947353Z return func(*args, **kwargs) 2025-09-07T07:34:42.7947388Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7947568Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7947656Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7947658Z 2025-09-07T07:34:42.7947864Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7947868Z 2025-09-07T07:34:42.7947870Z 2025-09-07T07:34:42.7947941Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7948151Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7948153Z 2025-09-07T07:34:42.7948238Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7948312Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7948347Z inline_call [] 2025-09-07T07:34:42.7948405Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7948439Z inductor [] 2025-09-07T07:34:42.7948514Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7948586Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7948860Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7948978Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7949029Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7949180Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7950241Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7950394Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7950530Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7950651Z _ WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.7950697Z Traceback (most recent call last): 2025-09-07T07:34:42.7950861Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1441, in test_while_loop_with_unbacked_symint_closure 2025-09-07T07:34:42.7950899Z self._run_test( 2025-09-07T07:34:42.7951011Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7951067Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7951107Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7951239Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7951286Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7951325Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7951474Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7951522Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7951560Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7951719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7951762Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7951801Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7951941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7952022Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7952064Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7953186Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7953233Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7953386Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7953439Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7953482Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7953623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7953674Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7953712Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7953828Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7953897Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7953939Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7954065Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7954146Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7954189Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7954331Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7954375Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7954412Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7954552Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7954590Z return aot_autograd( 2025-09-07T07:34:42.7954639Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7954787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7955816Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7955863Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7956023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7956106Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7956151Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7956335Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7956378Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7956629Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7956670Z fx_g = _create_graph( 2025-09-07T07:34:42.7956704Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7956868Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7956902Z fx_g = make_fx( 2025-09-07T07:34:42.7956934Z ^^^^^^^^ 2025-09-07T07:34:42.7957115Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7957161Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7957198Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7957344Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7957388Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7957426Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7957584Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7957621Z t = dispatch_trace( 2025-09-07T07:34:42.7957655Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7958743Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7958787Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7958826Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7958950Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7958989Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7959025Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7959186Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7959266Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7959307Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7959432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7959469Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7959527Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7959653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7959697Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7959732Z ^^^^^^^^^ 2025-09-07T07:34:42.7959863Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7959904Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7959939Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7960088Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7960234Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7960269Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7960426Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7961471Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7961516Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7961695Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7961734Z outs_pair = fn(*args) 2025-09-07T07:34:42.7961768Z ^^^^^^^^^ 2025-09-07T07:34:42.7961939Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7962007Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7962053Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7962227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7962266Z outs_pair = fn(*args) 2025-09-07T07:34:42.7962301Z ^^^^^^^^^ 2025-09-07T07:34:42.7962476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7962555Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7962596Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7962791Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7962861Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7962910Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7963083Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7963121Z outs_pair = fn(*args) 2025-09-07T07:34:42.7963156Z ^^^^^^^^^ 2025-09-07T07:34:42.7963346Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7964357Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7964395Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7964563Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7964608Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7964645Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7964774Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7964816Z return handle_torch_function( 2025-09-07T07:34:42.7964851Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7965008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7965082Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7965129Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7965295Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7965336Z return func(*args, **kwargs) 2025-09-07T07:34:42.7965371Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7965494Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7965564Z result = _engine_run_backward( 2025-09-07T07:34:42.7965601Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7965746Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7965869Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7965920Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7966046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7966088Z return user_fn(self, *args) 2025-09-07T07:34:42.7967248Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7967396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7967441Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7967478Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7967637Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7967679Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7967718Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7967844Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7967912Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7967947Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7968112Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7968163Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7968203Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7968340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7968389Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7968428Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7968591Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7968639Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7968678Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7968837Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7968875Z t = dispatch_trace( 2025-09-07T07:34:42.7969884Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7969998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7970040Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7970078Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7970203Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7970241Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7970276Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7970459Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7970541Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7970581Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7970706Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7970743Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7970777Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7970903Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7970978Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7971013Z ^^^^^^^^^ 2025-09-07T07:34:42.7971162Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7971211Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7971245Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7971287Z File "", line 1, in 2025-09-07T07:34:42.7971432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.7971509Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.7972519Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7972656Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.7972704Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.7972744Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7972935Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7972979Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7973015Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7973187Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.7973248Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.7973286Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7973428Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.7973471Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.7973506Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7973647Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.7973736Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.7973782Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7973907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.7973969Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.7974012Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7974137Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.7974174Z leaves = list(leaves) 2025-09-07T07:34:42.7975168Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.7975292Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.7975328Z return func(x) 2025-09-07T07:34:42.7975361Z ^^^^^^^ 2025-09-07T07:34:42.7975499Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.7975565Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.7975622Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7975789Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7975831Z return func(*args, **kwargs) 2025-09-07T07:34:42.7975866Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7976046Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.7976131Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.7976148Z 2025-09-07T07:34:42.7976374Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.7976378Z 2025-09-07T07:34:42.7976379Z 2025-09-07T07:34:42.7976452Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.7976740Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.7976743Z 2025-09-07T07:34:42.7976828Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.7976903Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7976938Z inline_call [] 2025-09-07T07:34:42.7976995Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7977028Z inductor [] 2025-09-07T07:34:42.7977101Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7977176Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7978515Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7978635Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7978716Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7978867Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7978953Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7979084Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7979202Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7979276Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.7979311Z inline_call [] 2025-09-07T07:34:42.7979367Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.7979440Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.7979511Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.7979770Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.7979883Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.7979933Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.7980084Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.7980170Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.7980300Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.7980435Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7980488Z =================================== FAILURES =================================== 2025-09-07T07:34:42.7980610Z _ WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True _ 2025-09-07T07:34:42.7981702Z Traceback (most recent call last): 2025-09-07T07:34:42.7981868Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1441, in test_while_loop_with_unbacked_symint_closure 2025-09-07T07:34:42.7981904Z self._run_test( 2025-09-07T07:34:42.7982016Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1171, in _run_test 2025-09-07T07:34:42.7982109Z result_compiled = compiled_fn(*cloned_inputs2) 2025-09-07T07:34:42.7982149Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7982283Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__ 2025-09-07T07:34:42.7982330Z return super().__call__(*args, **kwargs) 2025-09-07T07:34:42.7982369Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7982523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl 2025-09-07T07:34:42.7982569Z return self._call_impl(*args, **kwargs) 2025-09-07T07:34:42.7982608Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7982743Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl 2025-09-07T07:34:42.7982787Z return forward_call(*args, **kwargs) 2025-09-07T07:34:42.7982825Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7982969Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 845, in compile_wrapper 2025-09-07T07:34:42.7983052Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-09-07T07:34:42.7983091Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7983245Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2196, in _call_user_compiler 2025-09-07T07:34:42.7983307Z raise BackendCompilerFailed( 2025-09-07T07:34:42.7983456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/output_graph.py", line 2171, in _call_user_compiler 2025-09-07T07:34:42.7983509Z compiled_fn = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7984515Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7984658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__ 2025-09-07T07:34:42.7984712Z compiled_gm = compiler_fn(gm, example_inputs) 2025-09-07T07:34:42.7984751Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7984866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 2425, in __call__ 2025-09-07T07:34:42.7984933Z return self.compiler_fn(model_, inputs_, **self.kwargs) 2025-09-07T07:34:42.7984976Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7985106Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/testing.py", line 267, in __call__ 2025-09-07T07:34:42.7985169Z return lookup_backend(self.backend)(gm, example_inputs) 2025-09-07T07:34:42.7985210Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7985349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/inductor.py", line 31, in inductor 2025-09-07T07:34:42.7985394Z return compile_fx(*args, **kwargs) 2025-09-07T07:34:42.7985434Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7985571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2681, in compile_fx 2025-09-07T07:34:42.7985610Z return aot_autograd( 2025-09-07T07:34:42.7985646Z ^^^^^^^^^^^^^ 2025-09-07T07:34:42.7985801Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__ 2025-09-07T07:34:42.7985872Z cg = aot_module_simplified(gm, example_inputs, **self.kwargs) 2025-09-07T07:34:42.7985918Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7986081Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1105, in aot_module_simplified 2025-09-07T07:34:42.7986163Z aot_graph_capture = aot_stage1_graph_capture(aot_state, functional_call) 2025-09-07T07:34:42.7987241Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7987468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 161, in aot_stage1_graph_capture 2025-09-07T07:34:42.7987513Z aot_dispatch_autograd_graph( 2025-09-07T07:34:42.7987699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 436, in aot_dispatch_autograd_graph 2025-09-07T07:34:42.7987739Z fx_g = _create_graph( 2025-09-07T07:34:42.7987774Z ^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7987938Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 82, in _create_graph 2025-09-07T07:34:42.7987972Z fx_g = make_fx( 2025-09-07T07:34:42.7988005Z ^^^^^^^^ 2025-09-07T07:34:42.7988156Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2429, in wrapped 2025-09-07T07:34:42.7988201Z return make_fx_tracer.trace(f, *args) 2025-09-07T07:34:42.7988240Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7988385Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2356, in trace 2025-09-07T07:34:42.7988428Z return self._trace_inner(f, *args) 2025-09-07T07:34:42.7988464Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7988622Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7988679Z t = dispatch_trace( 2025-09-07T07:34:42.7988714Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7988826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7988868Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.7988903Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7989028Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7990042Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7990081Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7990243Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.7990323Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.7990364Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7990490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7990528Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7990563Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7990688Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.7990729Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.7990763Z ^^^^^^^^^ 2025-09-07T07:34:42.7990900Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 721, in flatten_fn 2025-09-07T07:34:42.7990940Z tree_out = root_fn(*tree_args) 2025-09-07T07:34:42.7990975Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7991145Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.7991194Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.7991230Z ^^^^^^^^^^^ 2025-09-07T07:34:42.7991387Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture.py", line 70, in inner_f 2025-09-07T07:34:42.7991449Z out, out_descs = call_and_expect_output_descs(f, args) 2025-09-07T07:34:42.7991492Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7991668Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7991722Z outs_pair = fn(*args) 2025-09-07T07:34:42.7992738Z ^^^^^^^^^ 2025-09-07T07:34:42.7992911Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1095, in inner_fn 2025-09-07T07:34:42.7992979Z outs, outs_descs = call_and_expect_output_descs(fn, args) 2025-09-07T07:34:42.7993023Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7993201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7993239Z outs_pair = fn(*args) 2025-09-07T07:34:42.7993273Z ^^^^^^^^^ 2025-09-07T07:34:42.7993453Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1043, in joint_helper 2025-09-07T07:34:42.7993513Z return _functionalized_f_helper(primals, tangents) 2025-09-07T07:34:42.7993557Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7993753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 801, in _functionalized_f_helper 2025-09-07T07:34:42.7993822Z f_outs, f_outs_descs = call_and_expect_output_descs(fn, f_args) 2025-09-07T07:34:42.7993869Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7994041Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 549, in call_and_expect_output_descs 2025-09-07T07:34:42.7994097Z outs_pair = fn(*args) 2025-09-07T07:34:42.7994130Z ^^^^^^^^^ 2025-09-07T07:34:42.7994320Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.7994364Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.7994402Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7994570Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 396, in inner_fn 2025-09-07T07:34:42.7994617Z backward_out = torch.autograd.grad( 2025-09-07T07:34:42.7995623Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7995752Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 452, in grad 2025-09-07T07:34:42.7995797Z return handle_torch_function( 2025-09-07T07:34:42.7995833Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7995974Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/overrides.py", line 1728, in handle_torch_function 2025-09-07T07:34:42.7996049Z result = mode.__torch_function__(public_api, types, args, kwargs) 2025-09-07T07:34:42.7996094Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7996263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.7996304Z return func(*args, **kwargs) 2025-09-07T07:34:42.7996339Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7996464Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/__init__.py", line 503, in grad 2025-09-07T07:34:42.7996596Z result = _engine_run_backward( 2025-09-07T07:34:42.7996634Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7996780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward 2025-09-07T07:34:42.7996900Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.7996948Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7997075Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/function.py", line 315, in apply 2025-09-07T07:34:42.7997138Z return user_fn(self, *args) 2025-09-07T07:34:42.7997193Z ^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7997338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 876, in backward 2025-09-07T07:34:42.7997381Z body_gm = materialize_as_graph( 2025-09-07T07:34:42.7997417Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7998554Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1155, in materialize_as_graph 2025-09-07T07:34:42.7998600Z gm = _materialize_as_graph_inner() 2025-09-07T07:34:42.7998636Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7998758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.7998798Z return fn(*args, **kwargs) 2025-09-07T07:34:42.7998833Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7998999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 1153, in _materialize_as_graph_inner 2025-09-07T07:34:42.7999051Z return _maybe_reenter_make_fx(fn)(*unfunc_t) 2025-09-07T07:34:42.7999091Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7999227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 124, in wrapped 2025-09-07T07:34:42.7999277Z gm = _CURRENT_MAKE_FX_TRACER.trace_subgraph( 2025-09-07T07:34:42.7999340Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7999503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2374, in trace_subgraph 2025-09-07T07:34:42.7999550Z return sub_tracer._trace_inner(f, *args) 2025-09-07T07:34:42.7999588Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7999749Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 2318, in _trace_inner 2025-09-07T07:34:42.7999788Z t = dispatch_trace( 2025-09-07T07:34:42.7999823Z ^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.7999937Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner 2025-09-07T07:34:42.7999980Z return disable_fn(*args, **kwargs) 2025-09-07T07:34:42.8000018Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.8001171Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.8001214Z return fn(*args, **kwargs) 2025-09-07T07:34:42.8001250Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.8001410Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1303, in dispatch_trace 2025-09-07T07:34:42.8001488Z graph = tracer.trace(root, concrete_args) # type: ignore[arg-type] 2025-09-07T07:34:42.8001528Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.8001656Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn 2025-09-07T07:34:42.8001694Z return fn(*args, **kwargs) 2025-09-07T07:34:42.8001728Z ^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.8001853Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/_symbolic_trace.py", line 868, in trace 2025-09-07T07:34:42.8001913Z (self.create_arg(fn(*args)),), 2025-09-07T07:34:42.8001947Z ^^^^^^^^^ 2025-09-07T07:34:42.8002101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1361, in wrapped 2025-09-07T07:34:42.8002149Z out = f(*tensors) # type:ignore[call-arg] 2025-09-07T07:34:42.8002183Z ^^^^^^^^^^^ 2025-09-07T07:34:42.8002224Z File "", line 1, in 2025-09-07T07:34:42.8002366Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 839, in body_fn 2025-09-07T07:34:42.8002445Z bw_body_fn(*selected_fw_carries, *additional_inputs, *grad_carries), 2025-09-07T07:34:42.8002526Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.8002662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 769, in flat_fn 2025-09-07T07:34:42.8002709Z grad_args = bw_fn(primals, tangents)[1] 2025-09-07T07:34:42.8002748Z ^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.8003905Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 430, in inner_fn_with_anomaly 2025-09-07T07:34:42.8003950Z return inner_fn(primals, tangents) 2025-09-07T07:34:42.8003986Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.8004157Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 289, in inner_fn 2025-09-07T07:34:42.8004200Z outs, tangent_mask = fn(*primals) 2025-09-07T07:34:42.8004239Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.8004384Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 503, in fw_with_masks 2025-09-07T07:34:42.8004427Z fw_out = pytree.tree_map_only( 2025-09-07T07:34:42.8004462Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.8004598Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1558, in tree_map_only 2025-09-07T07:34:42.8004704Z return tree_map(map_only(type_or_types_or_pred)(func), tree, is_leaf=is_leaf) 2025-09-07T07:34:42.8004750Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.8004873Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1376, in tree_map 2025-09-07T07:34:42.8004933Z return treespec.unflatten(map(func, *flat_args)) 2025-09-07T07:34:42.8004976Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.8005105Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1193, in unflatten 2025-09-07T07:34:42.8005143Z leaves = list(leaves) 2025-09-07T07:34:42.8005179Z ^^^^^^^^^^^^ 2025-09-07T07:34:42.8005301Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_pytree.py", line 1493, in wrapped 2025-09-07T07:34:42.8005337Z return func(x) 2025-09-07T07:34:42.8005370Z ^^^^^^^ 2025-09-07T07:34:42.8006466Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/utils.py", line 504, in 2025-09-07T07:34:42.8006677Z torch.Tensor, lambda x: x.requires_grad_(True), fw_out 2025-09-07T07:34:42.8006719Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.8006886Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/fx/experimental/proxy_tensor.py", line 1409, in __torch_function__ 2025-09-07T07:34:42.8006927Z return func(*args, **kwargs) 2025-09-07T07:34:42.8006963Z ^^^^^^^^^^^^^^^^^^^^^ 2025-09-07T07:34:42.8007146Z torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.8007231Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.8007234Z 2025-09-07T07:34:42.8007467Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.8007471Z 2025-09-07T07:34:42.8007472Z 2025-09-07T07:34:42.8007545Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.8007752Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.8007754Z 2025-09-07T07:34:42.8007840Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.8007953Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.8007988Z inline_call [] 2025-09-07T07:34:42.8008045Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.8008079Z inductor [] 2025-09-07T07:34:42.8008155Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.8008228Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.8008489Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.8008605Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.8008655Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.8008807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.8009890Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.8010023Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.8010145Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.8010215Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.8010275Z inline_call [] 2025-09-07T07:34:42.8010330Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.8010402Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.8010474Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.8010727Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.8010844Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.8010893Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.8011044Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.8011129Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.8011261Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.8011378Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.8011447Z ----------------------------- Captured stdout call ----------------------------- 2025-09-07T07:34:42.8011481Z inline_call [] 2025-09-07T07:34:42.8011536Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-09-07T07:34:42.8011610Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-09-07T07:34:42.8011679Z ----------------------------- Captured stderr call ----------------------------- 2025-09-07T07:34:42.8011944Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:841: UserWarning: Error detected in WhileLoopAutogradOpBackward. Traceback of forward call that caused the error: 2025-09-07T07:34:42.8013025Z File "/var/lib/jenkins/pytorch/test/inductor/test_control_flow.py", line 1020, in forward 2025-09-07T07:34:42.8013078Z return torch._higher_order_ops.while_loop( 2025-09-07T07:34:42.8013227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_higher_order_ops/while_loop.py", line 227, in while_loop 2025-09-07T07:34:42.8013311Z return while_loop_op(flat_cond_fn, flat_body_fn, tuple(flat_inputs), tuple()) 2025-09-07T07:34:42.8013440Z (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_anomaly_mode.cpp:122.) 2025-09-07T07:34:42.8013586Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-09-07T07:34:42.8013802Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-f5a4b9704255c2ad.xml - 2025-09-07T07:34:42.8013861Z =========================== short test summary info ============================ 2025-09-07T07:34:42.8014236Z FAILED [0.7628s] inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True - torch._dynamo.exc.BackendCompilerFailed: backend='' raised: 2025-09-07T07:34:42.8014320Z RuntimeError: only Tensors of floating point dtype can require gradients 2025-09-07T07:34:42.8014322Z 2025-09-07T07:34:42.8014528Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-07T07:34:42.8014531Z 2025-09-07T07:34:42.8014535Z 2025-09-07T07:34:42.8014608Z To execute this test, run the following from the base repo dir: 2025-09-07T07:34:42.8014815Z PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_control_flow.py WhileLoopTests.test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True 2025-09-07T07:34:42.8014817Z 2025-09-07T07:34:42.8014901Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T07:34:42.8014977Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T07:34:42.8015044Z ================== 1 failed, 245 deselected, 2 rerun in 2.80s ================== 2025-09-07T07:34:42.8015079Z Got exit code 1 2025-09-07T07:34:42.8015202Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-09-07T07:34:42.8015625Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T07:34:42.8015666Z import pkg_resources 2025-09-07T07:34:42.8015836Z Test results will be stored in test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-ea3210dd82ddac66.xml 2025-09-07T07:34:42.8016947Z ============================= test session starts ============================== 2025-09-07T07:34:42.8017063Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T07:34:42.8017103Z cachedir: .pytest_cache 2025-09-07T07:34:42.8017258Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T07:34:42.8017304Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T07:34:42.8017343Z configfile: pytest.ini 2025-09-07T07:34:42.8017504Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T07:34:42.8017579Z collecting ... collected 467 items / 81 deselected / 386 selected 2025-09-07T07:34:42.8017630Z stepcurrent: skipping 81 already run items. 2025-09-07T07:34:42.8017704Z Running 165 items in this shard 2025-09-07T07:34:42.8017707Z 2025-09-07T07:34:42.8017899Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cuda_dynamic_True_autograd_False PASSED [3.1658s] [ 0%] 2025-09-07T07:34:42.8018047Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_zero_loop_device_cpu_dynamic_False PASSED [1.2469s] [ 1%] 2025-09-07T07:34:42.8018187Z inductor/test_control_flow.py::WhileLoopTests::test_while_loop_zero_loop_device_cpu_dynamic_True PASSED [2.5977s] [ 1%] 2025-09-07T07:34:42.8018387Z inductor/test_control_flow.py::AssociativeScanTests::test_associative_scan_CUDA_flip_combine_mode_generic_backend_inductor_cpu PASSED [1.1468s] [ 2%] 2025-09-07T07:34:42.8018592Z inductor/test_control_flow.py::AssociativeScanTests::test_associative_scan_CUDA_flip_combine_mode_pointwise_backend_inductor_cpu PASSED [0.0141s] [ 3%] 2025-09-07T07:34:42.8018789Z inductor/test_control_flow.py::AssociativeScanTests::test_associative_scan_CUDA_flip_combine_mode_pointwise_backend_inductor_device_cuda SKIPPED [0.0002s] [ 3%] 2025-09-07T07:34:42.8018952Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_1 PASSED [1.4357s] [ 4%] 2025-09-07T07:34:42.8019113Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_1 PASSED [1.1729s] [ 4%] 2025-09-07T07:34:42.8019272Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_1 PASSED [1.2727s] [ 5%] 2025-09-07T07:34:42.8019432Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_5 PASSED [1.9539s] [ 6%] 2025-09-07T07:34:42.8019590Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_1 PASSED [1.1676s] [ 6%] 2025-09-07T07:34:42.8019748Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_5 PASSED [1.5774s] [ 7%] 2025-09-07T07:34:42.8020886Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_5 PASSED [1.5386s] [ 7%] 2025-09-07T07:34:42.8021069Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5 PASSED [3.1928s] [ 8%] 2025-09-07T07:34:42.8021228Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_1 PASSED [1.9442s] [ 9%] 2025-09-07T07:34:42.8021387Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_1 PASSED [1.8387s] [ 9%] 2025-09-07T07:34:42.8021549Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_5 PASSED [3.0265s] [ 10%] 2025-09-07T07:34:42.8021709Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_1 PASSED [1.8409s] [ 10%] 2025-09-07T07:34:42.8021869Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_5 PASSED [2.7808s] [ 11%] 2025-09-07T07:34:42.8022024Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_5 PASSED [3.1324s] [ 12%] 2025-09-07T07:34:42.8022187Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_5 PASSED [2.1097s] [ 12%] 2025-09-07T07:34:42.8022348Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5 PASSED [2.2932s] [ 13%] 2025-09-07T07:34:42.8022509Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_1 PASSED [1.5203s] [ 13%] 2025-09-07T07:34:42.8022679Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5 PASSED [1.8585s] [ 14%] 2025-09-07T07:34:42.8022840Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_1 PASSED [1.3696s] [ 15%] 2025-09-07T07:34:42.8022998Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5 PASSED [2.2560s] [ 15%] 2025-09-07T07:34:42.8023158Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5 PASSED [3.2907s] [ 16%] 2025-09-07T07:34:42.8023344Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_1 PASSED [1.9628s] [ 16%] 2025-09-07T07:34:42.8023504Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_5 PASSED [3.7272s] [ 17%] 2025-09-07T07:34:42.8023663Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_1 PASSED [2.0090s] [ 18%] 2025-09-07T07:34:42.8023822Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_5 PASSED [3.2962s] [ 18%] 2025-09-07T07:34:42.8023980Z inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_5 PASSED [3.8780s] [ 19%] 2025-09-07T07:34:42.8024109Z inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cpu_dynamic_False PASSED [4.0607s] [ 20%] 2025-09-07T07:34:42.8025206Z inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cpu_dynamic_True PASSED [6.6868s] [ 20%] 2025-09-07T07:34:42.8025337Z inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cuda_dynamic_False PASSED [3.6843s] [ 21%] 2025-09-07T07:34:42.8025464Z inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cuda_dynamic_True PASSED [6.4564s] [ 21%] 2025-09-07T07:34:42.8025618Z inductor/test_control_flow.py::ScanTests::test_scan_compare_chunked_ce_with_no_scan_device_cpu_dynamic_True PASSED [6.0728s] [ 22%] 2025-09-07T07:34:42.8025788Z inductor/test_control_flow.py::ScanTests::test_scan_compare_chunked_ce_with_no_scan_device_cuda_dynamic_False PASSED [3.8875s] [ 23%] 2025-09-07T07:34:42.8026040Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_5 [W907 07:31:05.226755245 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8026149Z [W907 07:31:05.262761952 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8026252Z [W907 07:31:05.277093411 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8026353Z [W907 07:31:05.295031976 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8026453Z [W907 07:31:05.312025545 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8026618Z [W907 07:31:05.332956395 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8026718Z [W907 07:31:05.349982174 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8026816Z [W907 07:31:05.367131530 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8026913Z [W907 07:31:05.381025415 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8027011Z [W907 07:31:05.398021065 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8027110Z [W907 07:31:05.415019913 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8027208Z [W907 07:31:05.429017736 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8027378Z [W907 07:31:05.443017309 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8027477Z [W907 07:31:06.901906717 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8027576Z [W907 07:31:06.914036198 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8027673Z [W907 07:31:06.931008368 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8028802Z [W907 07:31:06.946466429 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8028902Z [W907 07:31:06.959018134 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8029039Z [W907 07:31:06.395244517 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8029138Z [W907 07:31:06.407018963 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8029236Z [W907 07:31:06.430887240 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8029335Z [W907 07:31:06.445014712 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8029434Z [W907 07:31:06.460015610 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8029473Z PASSED [1.2860s] [ 23%] 2025-09-07T07:34:42.8029717Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_5 [W907 07:31:06.538750376 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8029815Z [W907 07:31:06.550087199 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8029915Z [W907 07:31:06.566056733 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8030012Z [W907 07:31:06.580026076 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8030109Z [W907 07:31:07.596020849 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8030207Z [W907 07:31:07.611021559 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8030374Z [W907 07:31:07.628120885 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8030472Z [W907 07:31:07.642022560 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8030570Z [W907 07:31:07.659017989 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8030667Z [W907 07:31:07.674018627 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8030899Z [W907 07:31:07.689017405 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8031001Z [W907 07:31:07.702018604 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8032499Z [W907 07:31:07.172967963 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8032603Z [W907 07:31:07.189457239 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8032703Z [W907 07:31:07.213433476 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8032801Z [W907 07:31:07.235515719 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8032900Z [W907 07:31:07.254032906 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8032998Z [W907 07:31:08.727190253 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8033145Z [W907 07:31:08.739177376 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8033243Z [W907 07:31:08.755017872 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8033362Z [W907 07:31:08.770360346 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8033460Z [W907 07:31:08.784017983 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8033499Z PASSED [1.3252s] [ 24%] 2025-09-07T07:34:42.8033743Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_5 [W907 07:31:08.864393236 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8033840Z [W907 07:31:08.876184302 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8033969Z [W907 07:31:08.892131736 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8034067Z [W907 07:31:08.912081661 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8034166Z [W907 07:31:08.925054059 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8034263Z [W907 07:31:08.939033203 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8034362Z [W907 07:31:08.955324682 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8034507Z [W907 07:31:08.971038410 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8034605Z [W907 07:31:08.984041757 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8034703Z [W907 07:31:08.999034666 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8036018Z [W907 07:31:08.015034840 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8036119Z [W907 07:31:08.033965020 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8036220Z [W907 07:31:08.502178551 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8036317Z [W907 07:31:08.513044899 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8036442Z [W907 07:31:08.526021538 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8036605Z [W907 07:31:08.544026472 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8036703Z [W907 07:31:08.564020817 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8036800Z [W907 07:31:09.999255985 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8036900Z [W907 07:31:09.010038485 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8036996Z [W907 07:31:09.025022664 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8037094Z [W907 07:31:09.038022452 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8037239Z [W907 07:31:09.058019556 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8037280Z PASSED [1.2717s] [ 24%] 2025-09-07T07:34:42.8037523Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_5 [W907 07:31:09.135288645 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8037621Z [W907 07:31:09.149130159 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8037717Z [W907 07:31:09.166131219 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8037818Z [W907 07:31:09.185035919 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8037916Z [W907 07:31:09.199026752 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8038040Z [W907 07:31:09.215031106 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8038138Z [W907 07:31:09.231215267 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8038237Z [W907 07:31:09.247038993 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8040344Z [W907 07:31:09.266025382 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8040460Z [W907 07:31:09.281021310 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8040563Z [W907 07:31:09.295021684 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8040717Z [W907 07:31:09.311020018 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8040816Z [W907 07:31:10.335178072 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8040918Z [W907 07:31:10.346039532 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8041015Z [W907 07:31:10.362023335 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8041115Z [W907 07:31:10.377020574 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8041212Z [W907 07:31:10.392016093 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8041309Z [W907 07:31:11.865833630 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8041406Z [W907 07:31:11.877239662 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8041505Z [W907 07:31:11.890021853 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8041662Z [W907 07:31:11.905016071 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8041762Z [W907 07:31:11.919018664 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8041800Z PASSED [1.8558s] [ 25%] 2025-09-07T07:34:42.8042047Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_1 [W907 07:31:11.997445905 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8042165Z [W907 07:31:11.009097464 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8042264Z [W907 07:31:11.026164791 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8042361Z [W907 07:31:11.043029972 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8042463Z [W907 07:31:11.520680852 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8043855Z [W907 07:31:12.916618532 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8043899Z PASSED [0.9936s] [ 26%] 2025-09-07T07:34:42.8044162Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_5 [W907 07:31:12.990910543 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8044265Z [W907 07:31:12.005426409 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8044415Z [W907 07:31:12.019059567 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8044514Z [W907 07:31:12.035350458 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8044616Z [W907 07:31:12.052025091 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8044715Z [W907 07:31:12.069021089 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8044812Z [W907 07:31:12.086247555 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8044933Z [W907 07:31:12.107029288 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8045034Z [W907 07:31:12.121017491 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8045133Z [W907 07:31:12.135025774 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8045233Z [W907 07:31:12.151019088 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8045332Z [W907 07:31:12.166021466 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8045464Z [W907 07:31:13.631376229 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8045564Z [W907 07:31:13.642040841 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8045712Z [W907 07:31:13.656023885 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8045811Z [W907 07:31:13.672023318 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8045909Z [W907 07:31:13.690021373 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8046007Z [W907 07:31:13.131078484 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8046104Z [W907 07:31:13.143037678 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8047605Z [W907 07:31:13.163076352 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8047711Z [W907 07:31:13.179957992 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8047810Z [W907 07:31:13.194195652 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8047847Z PASSED [1.2893s] [ 26%] 2025-09-07T07:34:42.8048092Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_5 [W907 07:31:13.282918230 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8048228Z [W907 07:31:13.295130830 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8048326Z [W907 07:31:13.309082374 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8048423Z [W907 07:31:13.332034345 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8048523Z [W907 07:31:13.344027358 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8048622Z [W907 07:31:13.359021865 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8048720Z [W907 07:31:13.376082064 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8048875Z [W907 07:31:13.391027873 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8048974Z [W907 07:31:13.408032432 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8049073Z [W907 07:31:13.422020955 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8049171Z [W907 07:31:13.436022868 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8049267Z [W907 07:31:13.453019808 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8049366Z [W907 07:31:14.405217216 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8049466Z [W907 07:31:14.416282022 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8049563Z [W907 07:31:14.432026879 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8049677Z [W907 07:31:14.454356409 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8049775Z [W907 07:31:14.476022549 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8049873Z [W907 07:31:15.397080738 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8051204Z [W907 07:31:15.411015761 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8051318Z [W907 07:31:15.428383695 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8051417Z [W907 07:31:15.444019733 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8051559Z [W907 07:31:15.460019178 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8051597Z PASSED [2.2578s] [ 27%] 2025-09-07T07:34:42.8051842Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_5 [W907 07:31:15.539980576 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8051941Z [W907 07:31:15.554122877 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8052040Z [W907 07:31:15.570062571 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8052138Z [W907 07:31:15.585035370 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8052283Z [W907 07:31:16.602029099 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8052381Z [W907 07:31:16.620011323 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8052481Z [W907 07:31:16.637098080 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8052578Z [W907 07:31:16.655942962 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8052675Z [W907 07:31:16.673015730 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8052773Z [W907 07:31:16.688023668 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8052887Z [W907 07:31:16.704023731 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8052985Z [W907 07:31:16.719019460 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8053081Z [W907 07:31:17.656144201 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8053179Z [W907 07:31:17.671208818 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8053277Z [W907 07:31:17.685029274 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8053375Z [W907 07:31:17.702026633 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8054674Z [W907 07:31:17.719020002 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8054777Z [W907 07:31:18.603291864 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8054879Z [W907 07:31:18.614021405 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8054977Z [W907 07:31:18.631014545 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8055074Z [W907 07:31:18.652068733 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8055172Z [W907 07:31:18.669178600 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8055211Z PASSED [2.2062s] [ 27%] 2025-09-07T07:34:42.8055468Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_1 [W907 07:31:18.747617221 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8055584Z [W907 07:31:18.761103071 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8055685Z [W907 07:31:18.778163660 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8055782Z [W907 07:31:18.792027485 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8055918Z [W907 07:31:19.713504708 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8056015Z [W907 07:31:19.122009031 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8056079Z PASSED [1.4479s] [ 28%] 2025-09-07T07:34:42.8056343Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_5 [W907 07:31:19.195072470 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8056444Z [W907 07:31:19.207115653 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8056610Z [W907 07:31:19.225077618 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8056710Z [W907 07:31:19.242036916 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8056808Z [W907 07:31:19.259027206 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8058887Z [W907 07:31:19.274027294 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8058993Z [W907 07:31:19.291158791 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8059093Z [W907 07:31:19.307974612 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8059192Z [W907 07:31:19.325025200 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8059292Z [W907 07:31:19.343024834 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8059389Z [W907 07:31:19.360020513 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8059520Z [W907 07:31:19.377025282 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8059617Z [W907 07:31:20.341736255 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8059715Z [W907 07:31:20.354097453 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8059812Z [W907 07:31:20.371028813 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8059911Z [W907 07:31:20.390024212 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8060052Z [W907 07:31:20.405993216 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8060152Z [W907 07:31:21.300402678 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8060249Z [W907 07:31:21.317197000 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8060348Z [W907 07:31:21.335021777 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8060446Z [W907 07:31:21.351035689 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8060545Z [W907 07:31:21.366013558 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8060582Z PASSED [2.2476s] [ 29%] 2025-09-07T07:34:42.8060830Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_5 [W907 07:31:21.443064910 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8060929Z [W907 07:31:21.457104083 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8061045Z [W907 07:31:21.472068251 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8063189Z [W907 07:31:21.487030140 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8063299Z [W907 07:31:21.503023293 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8063396Z [W907 07:31:21.519021187 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8063495Z [W907 07:31:21.534145834 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8063593Z [W907 07:31:21.551037494 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8063738Z [W907 07:31:21.572363579 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8063837Z [W907 07:31:22.589036702 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8063936Z [W907 07:31:22.603028675 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8064033Z [W907 07:31:22.620028724 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8064204Z [W907 07:31:22.586584311 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8064302Z [W907 07:31:23.598419046 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8064400Z [W907 07:31:23.615030880 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8064498Z [W907 07:31:23.632027259 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8064611Z [W907 07:31:23.646024042 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8064710Z [W907 07:31:23.560465199 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8064811Z [W907 07:31:23.571024563 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8064909Z [W907 07:31:23.585014065 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8065026Z [W907 07:31:24.600013364 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8065123Z [W907 07:31:24.618972994 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8065161Z PASSED [2.2515s] [ 29%] 2025-09-07T07:34:42.8067739Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5 [W907 07:31:24.698709155 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8067853Z [W907 07:31:24.782335110 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8067953Z [W907 07:31:24.782529027 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8068052Z [W907 07:31:24.782593716 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8068153Z [W907 07:31:24.782648105 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8068251Z [W907 07:31:24.782699724 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8068348Z [W907 07:31:24.782949160 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8068446Z [W907 07:31:24.783014899 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8068546Z [W907 07:31:24.783079058 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8068644Z [W907 07:31:24.783117949 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8068741Z [W907 07:31:24.783155468 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8068873Z [W907 07:31:24.783193007 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8068973Z [W907 07:31:24.326921292 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8069072Z [W907 07:31:24.338377772 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8069169Z [W907 07:31:24.338566770 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8069268Z [W907 07:31:24.338616699 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8069404Z [W907 07:31:24.338659359 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8072088Z [W907 07:31:25.783486665 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8072197Z [W907 07:31:25.783748121 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8072345Z [W907 07:31:25.783785050 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8072445Z [W907 07:31:25.783816030 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8072543Z [W907 07:31:25.783844909 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8072581Z PASSED [1.1661s] [ 30%] 2025-09-07T07:34:42.8072834Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_5 [W907 07:31:25.861567220 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8072936Z [W907 07:31:25.861804007 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8073035Z [W907 07:31:25.861901646 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8073133Z [W907 07:31:25.861959225 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8073231Z [W907 07:31:25.862024074 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8073382Z [W907 07:31:25.862079543 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8073482Z [W907 07:31:25.862193771 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8073580Z [W907 07:31:25.862243141 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8073728Z [W907 07:31:25.862282561 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8073828Z [W907 07:31:25.862318269 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8073938Z [W907 07:31:25.862361640 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8074037Z [W907 07:31:25.862398879 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8074137Z [W907 07:31:25.421679294 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8074235Z [W907 07:31:25.433119534 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8074333Z [W907 07:31:25.433237103 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8074431Z [W907 07:31:25.433287182 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8074531Z [W907 07:31:25.433329191 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8074631Z [W907 07:31:26.557575537 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8074733Z [W907 07:31:26.557836293 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8074844Z [W907 07:31:26.557880132 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8074947Z [W907 07:31:26.557919303 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8075047Z [W907 07:31:26.557959671 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8075092Z PASSED [1.7740s] [ 30%] 2025-09-07T07:34:42.8075339Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_1 [W907 07:31:27.634384852 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8075443Z [W907 07:31:27.634611509 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8075582Z [W907 07:31:27.634786147 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8075729Z [W907 07:31:27.634855096 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8075843Z [W907 07:31:27.326647622 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8075945Z [W907 07:31:27.518830432 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8075986Z PASSED [0.9549s] [ 31%] 2025-09-07T07:34:42.8076234Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_1 [W907 07:31:28.591921042 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8076333Z [W907 07:31:28.592332035 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8076441Z [W907 07:31:28.592589301 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8076624Z [W907 07:31:28.592654560 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8076729Z [W907 07:31:28.246714295 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8076831Z [W907 07:31:28.417393393 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8076899Z PASSED [0.8946s] [ 32%] 2025-09-07T07:34:42.8077190Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5 [W907 07:31:28.486439203 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8079361Z [W907 07:31:28.486798297 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8079462Z [W907 07:31:28.486930075 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8079564Z [W907 07:31:28.486993784 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8079664Z [W907 07:31:28.487073874 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8079765Z [W907 07:31:28.487134673 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8079864Z [W907 07:31:28.487303440 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8079965Z [W907 07:31:28.487362549 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8080063Z [W907 07:31:28.487408478 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8080249Z [W907 07:31:28.487450788 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8080348Z [W907 07:31:28.487492047 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8080448Z [W907 07:31:28.487534056 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8080596Z [W907 07:31:29.149963907 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8080732Z [W907 07:31:29.161314570 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8080831Z [W907 07:31:29.161515916 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8080933Z [W907 07:31:29.161583995 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8081031Z [W907 07:31:29.161630264 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8081132Z [W907 07:31:29.531264872 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8081231Z [W907 07:31:29.541104917 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8081378Z [W907 07:31:29.541186585 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8081476Z [W907 07:31:29.541221985 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8083176Z [W907 07:31:29.541257274 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8083253Z PASSED [1.1208s] [ 32%] 2025-09-07T07:34:42.8083509Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5 [W907 07:31:30.607790500 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8083610Z [W907 07:31:30.608077587 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8083709Z [W907 07:31:30.608201175 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8083824Z [W907 07:31:30.608266374 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8083928Z [W907 07:31:30.608320603 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8084025Z [W907 07:31:30.608373262 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8084126Z [W907 07:31:30.608522770 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8084224Z [W907 07:31:30.608577369 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8084347Z [W907 07:31:30.608617959 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8084445Z [W907 07:31:30.608655369 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8084592Z [W907 07:31:30.608691608 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8084690Z [W907 07:31:30.608729167 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8084790Z [W907 07:31:30.296249577 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8084888Z [W907 07:31:30.309210956 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8084987Z [W907 07:31:30.309359143 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8085084Z [W907 07:31:30.309409543 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8085185Z [W907 07:31:30.309485631 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8085284Z [W907 07:31:31.687327708 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8085383Z [W907 07:31:31.697131283 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8087062Z [W907 07:31:31.697221842 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8087173Z [W907 07:31:31.697257961 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8087314Z [W907 07:31:31.697296090 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8087353Z PASSED [1.1551s] [ 33%] 2025-09-07T07:34:42.8087629Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_1 [W907 07:31:31.766313711 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8087733Z [W907 07:31:31.766687655 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8087831Z [W907 07:31:31.766904371 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8087929Z [W907 07:31:31.766960890 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8088026Z [W907 07:31:32.780085529 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8088159Z [W907 07:31:32.977413442 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8088196Z PASSED [1.2803s] [ 33%] 2025-09-07T07:34:42.8088443Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5 [W907 07:31:32.045558096 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8088542Z [W907 07:31:32.045847841 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8088640Z [W907 07:31:32.045961180 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8088738Z [W907 07:31:32.046035839 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8088839Z [W907 07:31:32.046096158 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8088939Z [W907 07:31:32.046148877 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8089040Z [W907 07:31:32.046255666 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8089138Z [W907 07:31:32.046308185 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8089275Z [W907 07:31:32.046355904 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8089394Z [W907 07:31:32.046396103 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8090967Z [W907 07:31:32.046434533 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8091113Z [W907 07:31:32.046471772 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8091213Z [W907 07:31:33.076159805 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8091315Z [W907 07:31:33.088186707 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8091413Z [W907 07:31:33.088316656 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8091512Z [W907 07:31:33.088369425 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8091609Z [W907 07:31:33.088416214 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8091709Z [W907 07:31:34.982249595 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8091808Z [W907 07:31:34.982637589 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8091904Z [W907 07:31:34.982681788 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8092002Z [W907 07:31:34.982715739 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8092101Z [W907 07:31:34.982747758 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8092139Z PASSED [2.0123s] [ 34%] 2025-09-07T07:34:42.8092412Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_5 [W907 07:31:34.060585917 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8092512Z [W907 07:31:34.060943203 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8092611Z [W907 07:31:34.061088120 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8092709Z [W907 07:31:34.061156259 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8092805Z [W907 07:31:34.061213608 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8092903Z [W907 07:31:34.061265808 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8093031Z [W907 07:31:34.061394596 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8093130Z [W907 07:31:34.061447775 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8094549Z [W907 07:31:34.061489524 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8094652Z [W907 07:31:34.061527983 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8094751Z [W907 07:31:34.061567243 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8094851Z [W907 07:31:34.061606952 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8094949Z [W907 07:31:35.088492727 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8095046Z [W907 07:31:35.100383181 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8095147Z [W907 07:31:35.100492209 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8095245Z [W907 07:31:35.100544189 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8095344Z [W907 07:31:35.100590088 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8095446Z [W907 07:31:36.982076262 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8095564Z [W907 07:31:36.982476686 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8095663Z [W907 07:31:36.982534575 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8095816Z [W907 07:31:36.982580454 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8095914Z [W907 07:31:36.982623093 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8095952Z PASSED [2.0002s] [ 35%] 2025-09-07T07:34:42.8096197Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_1 [W907 07:31:36.061765024 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8096295Z [W907 07:31:36.062066939 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8096394Z [W907 07:31:36.062264977 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8096621Z [W907 07:31:36.062327566 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8096720Z [W907 07:31:37.064193400 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8096817Z [W907 07:31:37.259275718 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8098094Z PASSED [1.2663s] [ 35%] 2025-09-07T07:34:42.8098341Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_1 [W907 07:31:37.327434830 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8098471Z [W907 07:31:37.327803894 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8098573Z [W907 07:31:37.328051591 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8098676Z [W907 07:31:37.328116580 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8098773Z [W907 07:31:38.460519695 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8098874Z [W907 07:31:39.667212981 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8098910Z PASSED [1.4073s] [ 36%] 2025-09-07T07:34:42.8099170Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_1 [W907 07:31:39.735908916 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8099285Z [W907 07:31:39.736323179 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8099386Z [W907 07:31:39.736562356 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8099483Z [W907 07:31:39.736622935 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8099584Z [W907 07:31:40.866101034 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8099682Z [W907 07:31:41.782048208 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8099719Z PASSED [2.1156s] [ 36%] 2025-09-07T07:34:42.8100004Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_5 [W907 07:31:41.852080623 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8100106Z [W907 07:31:41.852472367 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8100204Z [W907 07:31:41.852635144 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8100304Z [W907 07:31:41.852717194 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8100401Z [W907 07:31:41.852790573 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8100521Z [W907 07:31:41.852859682 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8100618Z [W907 07:31:41.853063928 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8101981Z [W907 07:31:41.853142047 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8102081Z [W907 07:31:41.853206006 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8102186Z [W907 07:31:41.853265175 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8102284Z [W907 07:31:41.853321484 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8102384Z [W907 07:31:41.853381063 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8102481Z [W907 07:31:42.040954974 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8102583Z [W907 07:31:42.053192942 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8102680Z [W907 07:31:42.053573577 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8102778Z [W907 07:31:42.053671156 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8102875Z [W907 07:31:42.053729245 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8102977Z [W907 07:31:43.955272122 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8103074Z [W907 07:31:43.966416377 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8103202Z [W907 07:31:43.966595984 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8103301Z [W907 07:31:43.966636743 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8103400Z [W907 07:31:43.966673014 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8103436Z PASSED [2.1805s] [ 37%] 2025-09-07T07:34:42.8103679Z inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1 [W907 07:31:43.036303824 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8103804Z [W907 07:31:43.036694058 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8103904Z [W907 07:31:43.036965044 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8104003Z [W907 07:31:43.037060613 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8104105Z [W907 07:31:44.181516971 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8105360Z [W907 07:31:44.392977755 NNPACK.cpp:56] Could not initialize NNPACK! Reason: Unsupported hardware. 2025-09-07T07:34:42.8105399Z PASSED [1.4271s] [ 38%] 2025-09-07T07:34:42.8105581Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_False_scan_length_5 PASSED [1.7827s] [ 38%] 2025-09-07T07:34:42.8105759Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_False_scan_length_5 PASSED [1.6998s] [ 39%] 2025-09-07T07:34:42.8105935Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_True_scan_length_5 PASSED [1.6907s] [ 40%] 2025-09-07T07:34:42.8106112Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_False_scan_length_1 PASSED [2.1609s] [ 40%] 2025-09-07T07:34:42.8106285Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_False_scan_length_5 PASSED [1.6773s] [ 41%] 2025-09-07T07:34:42.8106476Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_True_scan_length_1 PASSED [1.4760s] [ 41%] 2025-09-07T07:34:42.8106733Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_True_scan_length_1 PASSED [1.4822s] [ 42%] 2025-09-07T07:34:42.8106906Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_True_scan_length_5 PASSED [1.7583s] [ 43%] 2025-09-07T07:34:42.8107081Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_1_pred_True_scan_length_1 PASSED [2.0684s] [ 43%] 2025-09-07T07:34:42.8107254Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_False_scan_length_5 PASSED [1.6981s] [ 44%] 2025-09-07T07:34:42.8107427Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_True_scan_length_1 PASSED [1.4634s] [ 44%] 2025-09-07T07:34:42.8107601Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_True_scan_length_5 PASSED [1.6668s] [ 45%] 2025-09-07T07:34:42.8107774Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_0_pred_False_scan_length_1 PASSED [1.5478s] [ 46%] 2025-09-07T07:34:42.8107947Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_0_pred_False_scan_length_5 PASSED [3.0010s] [ 46%] 2025-09-07T07:34:42.8108119Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_False_scan_length_1 PASSED [1.5316s] [ 47%] 2025-09-07T07:34:42.8108312Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_False_scan_length_5 PASSED [2.2517s] [ 47%] 2025-09-07T07:34:42.8108485Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_True_scan_length_1 PASSED [1.5388s] [ 48%] 2025-09-07T07:34:42.8108657Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_True_scan_length_5 PASSED [2.9648s] [ 49%] 2025-09-07T07:34:42.8108828Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_3_pred_True_scan_length_1 PASSED [1.4816s] [ 49%] 2025-09-07T07:34:42.8110214Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_False_scan_length_5 PASSED [2.2807s] [ 50%] 2025-09-07T07:34:42.8110390Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_True_scan_length_1 PASSED [1.5098s] [ 50%] 2025-09-07T07:34:42.8110562Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_False_scan_length_1 PASSED [1.5162s] [ 51%] 2025-09-07T07:34:42.8110737Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_True_scan_length_5 PASSED [2.9815s] [ 52%] 2025-09-07T07:34:42.8110909Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_False_scan_length_1 PASSED [1.5076s] [ 52%] 2025-09-07T07:34:42.8111080Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_True_scan_length_1 PASSED [1.5413s] [ 53%] 2025-09-07T07:34:42.8111250Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_True_scan_length_5 PASSED [2.2081s] [ 53%] 2025-09-07T07:34:42.8111426Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_0_pred_False_scan_length_1 PASSED [1.3819s] [ 54%] 2025-09-07T07:34:42.8111602Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_0_pred_False_scan_length_5 PASSED [1.8586s] [ 55%] 2025-09-07T07:34:42.8111799Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_1_pred_False_scan_length_5 PASSED [2.6520s] [ 55%] 2025-09-07T07:34:42.8111972Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_1_pred_True_scan_length_1 PASSED [1.3824s] [ 56%] 2025-09-07T07:34:42.8112150Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_3_pred_False_scan_length_5 PASSED [1.8644s] [ 56%] 2025-09-07T07:34:42.8112322Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_3_pred_True_scan_length_1 PASSED [1.4187s] [ 57%] 2025-09-07T07:34:42.8112496Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_False_scan_length_1 PASSED [1.4065s] [ 58%] 2025-09-07T07:34:42.8112669Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_True_scan_length_1 PASSED [1.3861s] [ 58%] 2025-09-07T07:34:42.8112843Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_3_pred_False_scan_length_5 PASSED [2.6123s] [ 59%] 2025-09-07T07:34:42.8113016Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_3_pred_True_scan_length_1 PASSED [1.3745s] [ 60%] 2025-09-07T07:34:42.8113191Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_0_pred_False_scan_length_5 PASSED [2.3952s] [ 60%] 2025-09-07T07:34:42.8113387Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_0_pred_True_scan_length_5 PASSED [2.4036s] [ 61%] 2025-09-07T07:34:42.8113558Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_False_scan_length_1 PASSED [1.4150s] [ 61%] 2025-09-07T07:34:42.8113731Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_True_scan_length_1 PASSED [2.3345s] [ 62%] 2025-09-07T07:34:42.8115050Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_False_scan_length_1 PASSED [1.4474s] [ 63%] 2025-09-07T07:34:42.8115247Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_False_scan_length_5 PASSED [2.3972s] [ 63%] 2025-09-07T07:34:42.8115433Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_True_scan_length_1 PASSED [1.4495s] [ 64%] 2025-09-07T07:34:42.8115607Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_True_scan_length_5 PASSED [2.3465s] [ 64%] 2025-09-07T07:34:42.8115780Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_False_scan_length_5 PASSED [3.3276s] [ 65%] 2025-09-07T07:34:42.8115952Z inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_1_pred_True_scan_length_1 PASSED [1.4810s] [ 66%] 2025-09-07T07:34:42.8116119Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_1 PASSED [0.9211s] [ 66%] 2025-09-07T07:34:42.8116288Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_1 PASSED [0.9236s] [ 67%] 2025-09-07T07:34:42.8116453Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_5 PASSED [1.1359s] [ 67%] 2025-09-07T07:34:42.8116699Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_1 PASSED [0.9223s] [ 68%] 2025-09-07T07:34:42.8116886Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_1 PASSED [0.9186s] [ 69%] 2025-09-07T07:34:42.8117050Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_5 PASSED [1.1409s] [ 69%] 2025-09-07T07:34:42.8117213Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_1 PASSED [0.9346s] [ 70%] 2025-09-07T07:34:42.8117379Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_5 PASSED [1.1324s] [ 70%] 2025-09-07T07:34:42.8117543Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_1 PASSED [1.5614s] [ 71%] 2025-09-07T07:34:42.8117707Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5 PASSED [3.1688s] [ 72%] 2025-09-07T07:34:42.8117870Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_1 PASSED [1.5172s] [ 72%] 2025-09-07T07:34:42.8118033Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_1 PASSED [1.5410s] [ 73%] 2025-09-07T07:34:42.8118196Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_5 PASSED [2.3053s] [ 73%] 2025-09-07T07:34:42.8118360Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_1 PASSED [1.5536s] [ 74%] 2025-09-07T07:34:42.8118546Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_5 PASSED [1.0389s] [ 75%] 2025-09-07T07:34:42.8119877Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5 PASSED [0.9970s] [ 75%] 2025-09-07T07:34:42.8120050Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_5 PASSED [0.9748s] [ 76%] 2025-09-07T07:34:42.8120314Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_1 PASSED [0.8120s] [ 76%] 2025-09-07T07:34:42.8120479Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5 PASSED [1.1463s] [ 77%] 2025-09-07T07:34:42.8120693Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5 PASSED [1.1285s] [ 78%] 2025-09-07T07:34:42.8120861Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_1 PASSED [1.4128s] [ 78%] 2025-09-07T07:34:42.8121025Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5 PASSED [3.1947s] [ 79%] 2025-09-07T07:34:42.8121190Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_5 PASSED [2.2250s] [ 80%] 2025-09-07T07:34:42.8121355Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_1 PASSED [1.4079s] [ 80%] 2025-09-07T07:34:42.8121521Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_5 PASSED [2.1750s] [ 81%] 2025-09-07T07:34:42.8121684Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_5 PASSED [2.3623s] [ 81%] 2025-09-07T07:34:42.8121850Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1 PASSED [1.4387s] [ 82%] 2025-09-07T07:34:42.8122016Z inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_5 PASSED [2.3684s] [ 83%] 2025-09-07T07:34:42.8122198Z inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_False_dim_0 PASSED [1.2937s] [ 83%] 2025-09-07T07:34:42.8122351Z inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_True_dim_0 PASSED [1.3322s] [ 84%] 2025-09-07T07:34:42.8122506Z inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_True_dim_2 PASSED [2.3240s] [ 84%] 2025-09-07T07:34:42.8122657Z inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_False_dim_1 PASSED [1.8320s] [ 85%] 2025-09-07T07:34:42.8122809Z inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_True_dim_1 PASSED [1.9270s] [ 86%] 2025-09-07T07:34:42.8122957Z inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_True_dim_2 PASSED [1.9595s] [ 86%] 2025-09-07T07:34:42.8123112Z inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_False_dim_0 PASSED [1.3353s] [ 87%] 2025-09-07T07:34:42.8123266Z inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_False_dim_1 PASSED [1.3606s] [ 87%] 2025-09-07T07:34:42.8124576Z inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_True_dim_0 PASSED [1.4107s] [ 88%] 2025-09-07T07:34:42.8124734Z inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_False_dim_1 PASSED [1.9199s] [ 89%] 2025-09-07T07:34:42.8124884Z inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_0 PASSED [1.9824s] [ 89%] 2025-09-07T07:34:42.8125057Z inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_1 PASSED [3.1327s] [ 90%] 2025-09-07T07:34:42.8125208Z inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_2 PASSED [2.0701s] [ 90%] 2025-09-07T07:34:42.8125341Z inductor/test_control_flow.py::ScanTests::test_scan_with_clamp_device_cuda_dynamic_False PASSED [0.9842s] [ 91%] 2025-09-07T07:34:42.8125470Z inductor/test_control_flow.py::ScanTests::test_scan_with_clamp_device_cuda_dynamic_True PASSED [2.0827s] [ 92%] 2025-09-07T07:34:42.8125637Z inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cpu_dynamic_False_autograd_False PASSED [0.7907s] [ 92%] 2025-09-07T07:34:42.8125801Z inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cpu_dynamic_False_autograd_True PASSED [1.3771s] [ 93%] 2025-09-07T07:34:42.8125951Z inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cpu_dynamic_True_autograd_False PASSED [1.4044s] [ 93%] 2025-09-07T07:34:42.8126102Z inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cuda_dynamic_False_autograd_False PASSED [0.7885s] [ 94%] 2025-09-07T07:34:42.8126251Z inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cuda_dynamic_True_autograd_True PASSED [2.1173s] [ 95%] 2025-09-07T07:34:42.8126398Z inductor/test_control_flow.py::MapTests::test_map_pytree_in_out_device_cpu_dynamic_False_autograd_False PASSED [1.8105s] [ 95%] 2025-09-07T07:34:42.8126622Z inductor/test_control_flow.py::MapTests::test_map_pytree_in_out_device_cpu_dynamic_True_autograd_True PASSED [2.1184s] [ 96%] 2025-09-07T07:34:42.8126762Z inductor/test_control_flow.py::MapTests::test_map_simple_device_cpu_dynamic_False_autograd_True PASSED [1.1018s] [ 96%] 2025-09-07T07:34:42.8126899Z inductor/test_control_flow.py::MapTests::test_map_simple_device_cpu_dynamic_True_autograd_False PASSED [0.5779s] [ 97%] 2025-09-07T07:34:42.8127037Z inductor/test_control_flow.py::MapTests::test_map_simple_device_cuda_dynamic_True_autograd_False PASSED [0.6823s] [ 98%] 2025-09-07T07:34:42.8127200Z inductor/test_control_flow.py::MapTests::test_map_simple_device_cuda_dynamic_True_autograd_True PASSED [0.8318s] [ 98%] 2025-09-07T07:34:42.8127357Z inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cpu_dynamic_False_autograd_True PASSED [1.0961s] [ 99%] 2025-09-07T07:34:42.8127513Z inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cuda_dynamic_True_autograd_True PASSED [2.2913s] [100%] 2025-09-07T07:34:42.8127517Z 2025-09-07T07:34:42.8127749Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-ea3210dd82ddac66.xml - 2025-09-07T07:34:42.8129025Z ========== 164 passed, 1 skipped, 81 deselected in 309.31s (0:05:09) =========== 2025-09-07T07:34:42.8132008Z The following tests failed consistently: ['test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True', 'test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True', 'test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True', 'test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True', 'test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True', 'test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True', 'test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True', 'test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True', 'test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True', 'test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True', 'test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True', 'test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True', 'test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True', 'test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True', 'test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True', 'test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True', 'test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True', 'test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True'] 2025-09-07T07:34:42.8132036Z 2025-09-07T07:34:42.8132204Z FINISHED PRINTING LOG FILE of inductor/test_control_flow 1/2 (test/test-reports/inductor.test_control_flow_1.2_451e621b9be894b1_.log) 2025-09-07T07:34:42.8132207Z 2025-09-07T07:34:42.8132300Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-09-07T07:34:42.8132347Z Uploading artifacts took 0.00 seconds 2025-09-07T07:34:42.8132393Z inductor/test_control_flow 1/2 failed! 2025-09-07T07:34:42.8132474Z Running inductor/test_cpu_repro 4/5 ... [2025-09-07 07:34:42.215624] 2025-09-07T07:34:42.8132519Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:34:42.8132822Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_cpu_repro.py', '--shard-id=4', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:34:42.215832] 2025-09-07T07:42:59.7847892Z 2025-09-07T07:42:59.7854448Z inductor/test_cpu_repro 4/5 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cpu_repro_4.5_4d5ffca55dfeebc1_.log 2025-09-07T07:42:59.7885938Z Running 137 items in this shard: test/inductor/test_cpu_repro.py::CPUReproTests::test__adaptive_avg_pool2d, test/inductor/test_cpu_repro.py::CPUReproTests::test_aten_normal_dtype, test/inductor/test_cpu_repro.py::CPUReproTests::test_atomic_add_lowp_fp, test/inductor/test_cpu_repro.py::CPUReproTests::test_avx2_bool_constant_pad_nd, test/inductor/test_cpu_repro.py::CPUReproTests::test_bf16_zeros, test/inductor/test_cpu_repro.py::CPUReproTests::test_bool_reduction_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_cat_mul, test/inductor/test_cpu_repro.py::CPUReproTests::test_conv2d_autocast, test/inductor/test_cpu_repro.py::CPUReproTests::test_conv_stride_constraints, test/inductor/test_cpu_repro.py::CPUReproTests::test_conv_used_from_multiple_places, test/inductor/test_cpu_repro.py::CPUReproTests::test_convert_int32_to_int64_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_convert_int64_to_int32_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_decomposed_dequant_relu_quant_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_decomposed_fake_quant_per_channel, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_relu_quant_dequant_relu_quant_lowering_uint8, test/inductor/test_cpu_repro.py::CPUReproTests::test_embedding_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_fp8_cast_float32_shape_4,2048,4096, test/inductor/test_cpu_repro.py::CPUReproTests::test_full_boolean_dynamic_shape, test/inductor/test_cpu_repro.py::CPUReproTests::test_group_norm_large_size, test/inductor/test_cpu_repro.py::CPUReproTests::test_inplace_squeeze_needed, test/inductor/test_cpu_repro.py::CPUReproTests::test_linear_used_from_multiple_places, test/inductor/test_cpu_repro.py::CPUReproTests::test_local_buffer_with_line_reuse, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_masked_fill_softmax, test/inductor/test_cpu_repro.py::CPUReproTests::test_masked_load_int64_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_no_redundant_to_dtypes_between_fused_scheduler_node, test/inductor/test_cpu_repro.py::CPUReproTests::test_ops_masked_with_bool_input, test/inductor/test_cpu_repro.py::CPUReproTests::test_pack_padded_sequence_lstm, test/inductor/test_cpu_repro.py::CPUReproTests::test_parallel_num_threads, test/inductor/test_cpu_repro.py::CPUReproTests::test_per_channel_fake_quant_uint8_bf16_input, test/inductor/test_cpu_repro.py::CPUReproTests::test_per_tensor_fake_quant_uint8, test/inductor/test_cpu_repro.py::CPUReproTests::test_reduction_float_to_int64, test/inductor/test_cpu_repro.py::CPUReproTests::test_relu_with_inf_value, test/inductor/test_cpu_repro.py::CPUReproTests::test_scalar_sign_with_min, test/inductor/test_cpu_repro.py::CPUReproTests::test_slice_scatter_default_end_value, test/inductor/test_cpu_repro.py::CPUReproTests::test_slice_scatter_issue122291, test/inductor/test_cpu_repro.py::CPUReproTests::test_tile2d_load_decomposed_dequant_add_relu_quant_uint8, test/inductor/test_cpu_repro.py::CPUReproTests::test_tile2d_store_channel_shuffle_cl_quant_output_uint8, test/inductor/test_cpu_repro.py::CPUReproTests::test_transpose_sum_outer, test/inductor/test_cpu_repro.py::CPUReproTests::test_two_local_buffers_in_outer_loop_fusion_case2, test/inductor/test_cpu_repro.py::CPUReproTests::test_uint64_pointwise_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_uint8_add, test/inductor/test_cpu_repro.py::CPUReproTests::test_uint8_sub, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_compare_op_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_contiguous_ModularIndexing, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_kernel_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_vertical_sum_cpu_only 2025-09-07T07:42:59.7914171Z 2025-09-07T07:42:59.7914296Z Running inductor/test_flex_decoding 1/2 ... [2025-09-07 07:42:59.785081] 2025-09-07T07:42:59.7914470Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:42:59.7914862Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_flex_decoding.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:42:59.785348] 2025-09-07T07:51:35.2826842Z 2025-09-07T07:51:35.2828035Z inductor/test_flex_decoding 1/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_flex_decoding_1.2_e231bb6c3354e034_.log 2025-09-07T07:51:35.2890230Z Running 282 items in this shard: test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod0_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod0_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod0_head_dims2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod1_head_dims2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod2_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod3_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod3_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod4_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod5_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod6_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod6_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_bfloat16_score_mod8_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod0_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod0_BLOCK_SIZE_64_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod1_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod1_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod1_BLOCK_SIZE_64_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod2_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod2_BLOCK_SIZE_64_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod3_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod3_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod5_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod5_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod7_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod7_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod8_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod8_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_bfloat16_score_mod8_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod0_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod1_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod1_BLOCK_SIZE_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod2_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod2_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod2_BLOCK_SIZE_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod3_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod4_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod4_BLOCK_SIZE_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod5_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod6_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod6_BLOCK_SIZE_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod7_BLOCK_SIZE_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod8_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float16_score_mod8_BLOCK_SIZE3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod0_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod1_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod1_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod1_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod3_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod3_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod4_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod4_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod5_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod6_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod7_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod7_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod7_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod7_BLOCK_SIZE_64_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_different_block_size_float32_score_mod8_BLOCK_SIZE_64_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod0_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod0_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod0_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod1_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod2_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod3_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod4_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod4_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod5_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod5_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod6_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod6_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod7_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod7_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod8_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod8_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float16_score_mod8_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod0_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod0_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod1_head_dims2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod2_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod2_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod3_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod5_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod6_head_dims2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod7_head_dims2_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod8_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_builtin_score_mods_float32_score_mod8_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_captured_buffers_all_dims_bfloat16_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_captured_buffers_all_dims_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_captured_buffers_all_dims_float32_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_captured_buffers_bfloat16_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_captured_buffers_float32_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_decode_at_different_input_position_float16_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_decode_at_different_input_position_float16_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_decode_at_different_input_position_float16_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_do_not_trigger_dynamic_shapes_on_empty_block_mask_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_fully_masked_out_rows_0_check_gqa_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_function_composition_float32_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod0_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod1_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod2_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod2_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod2_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod3_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod4_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod4_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod5_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod5_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod6_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod7_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_head_dependent_mask_mod_float16_score_mod7_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims0_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims0_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims0_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims0_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims0_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims1_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims1_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims1_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims1_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims1_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims1_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims2_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims2_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims2_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims2_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims2_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims3_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims3_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims3_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims3_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims3_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims3_score_mod6_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims3_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims0_batch_dims3_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims0_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims0_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims1_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims1_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims1_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims1_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims2_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims2_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims2_score_mod6_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims2_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims3_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims3_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims3_score_mod2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims3_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims3_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims1_batch_dims3_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims0_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims0_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims0_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims0_score_mod5_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims0_score_mod6_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims1_score_mod3_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims1_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims1_score_mod6_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims1_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims2_score_mod4_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims3_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims3_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims3_score_mod6_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims3_score_mod7_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_kv_batch_broadcast_float16_head_dims2_batch_dims3_score_mod8_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_larger_block_mask_bug_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_load_from_bias_seq_only_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_logsumexp_correctness_bfloat16_score_mod0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_logsumexp_correctness_float16_score_mod0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_logsumexp_correctness_float16_score_mod1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_logsumexp_correctness_float32_score_mod0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_logsumexp_correctness_float32_score_mod1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_multiple_score_mod_calls2_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_multiple_score_mod_calls_paged_attention2_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_multiple_score_mod_calls_paged_attention_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_njt_causal_float32_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_divisible_multi_token_offset_mask_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_divisible_offset_mask_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_divisible_offset_mask_with_captured_buffer_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod0_bfloat16_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod0_float32_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod0_float32_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod1_float16_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod2_bfloat16_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod2_float16_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod2_float32_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod3_float16_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod3_float32_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod4_bfloat16_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod4_bfloat16_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod4_float32_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod5_bfloat16_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod5_bfloat16_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod5_float16_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod5_float32_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod5_float32_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod6_bfloat16_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod6_float16_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod7_float32_head_dims0_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod8_bfloat16_head_dims0_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod8_bfloat16_head_dims1_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod8_float16_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod8_float16_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_equal_head_dims_score_mod8_float32_head_dims1_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_pow_2_headdim_head_dim_121_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_non_pow_2_headdim_head_dim_24_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_not_pw_of_two_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_padded_dense_causal_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod0_head_dims0_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod0_head_dims1_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod0_head_dims1_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod0_head_dims1_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod0_head_dims2_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod1_head_dims0_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod1_head_dims1_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod1_head_dims1_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod1_head_dims2_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod2_head_dims0_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod2_head_dims1_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod2_head_dims2_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod2_head_dims2_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod3_head_dims0_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod3_head_dims1_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod3_head_dims1_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod3_head_dims2_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod3_head_dims2_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod4_head_dims0_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod4_head_dims0_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod4_head_dims1_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod4_head_dims2_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod4_head_dims2_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod5_head_dims0_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod5_head_dims1_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod5_head_dims1_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod5_head_dims2_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod6_head_dims0_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod6_head_dims0_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod6_head_dims1_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod6_head_dims2_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod6_head_dims2_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod7_head_dims0_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod7_head_dims1_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod7_head_dims1_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod7_head_dims2_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod8_head_dims0_page_size_128_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod8_head_dims0_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod8_head_dims0_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod8_head_dims1_page_size_256_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod8_head_dims1_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_paged_attention_page_size_float16_score_mod8_head_dims2_page_size_64_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_recompile_changed_score_mod_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_skip_odd_keys_bfloat16_cuda_bfloat16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_skip_odd_keys_float32_cuda_float32, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s0_v_s0_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s0_v_s0_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s0_v_s2_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s0_v_s3_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s0_v_s3_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s1_v_s0_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s1_v_s0_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s1_v_s1_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s1_v_s2_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s1_v_s2_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s1_v_s3_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s1_v_s3_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s1_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s1_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s1_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s2_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s2_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s3_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s3_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s2_v_s3_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s0_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s1_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s1_head_dims1_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s2_head_dims0_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_strided_inputs_float16_k_s3_v_s3_head_dims2_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_subgraph_respect_decompostion_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_tma_decoding_float16_cuda_float16, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_windowed_full_mask_vs_sdpa_cuda, test/inductor/test_flex_decoding.py::TestFlexDecodingCUDA::test_windowed_partial_block_vs_sdpa_paged_attention_cuda 2025-09-07T07:51:35.2939513Z 2025-09-07T07:51:35.2939603Z Running inductor/test_fx_fusion 1/1 ... [2025-09-07 07:51:35.282769] 2025-09-07T07:51:35.2939774Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:51:35.2940164Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_fx_fusion.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:51:35.282973] 2025-09-07T07:51:38.4128200Z 2025-09-07T07:51:38.4128986Z inductor/test_fx_fusion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_fx_fusion_1.1_77b4d6643924ae5c_.log 2025-09-07T07:51:38.4129725Z Running 4 items in this shard: test/inductor/test_fx_fusion.py::TestFxFusion::test_linear_permute_fusion, test/inductor/test_fx_fusion.py::TestFxFusion::test_permute_bmm_fusion, test/inductor/test_fx_fusion.py::TestFxFusion::test_permute_linear_fusion, test/inductor/test_fx_fusion.py::TestFxFusion::test_sink_cat_after_pointwise 2025-09-07T07:51:38.4130217Z 2025-09-07T07:51:38.4132336Z Running inductor/test_gpu_cpp_wrapper 1/1 ... [2025-09-07 07:51:38.412981] 2025-09-07T07:51:38.4132534Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:51:38.4133924Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_gpu_cpp_wrapper.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:51:38.413184] 2025-09-07T07:57:42.5712246Z 2025-09-07T07:57:42.5714181Z inductor/test_gpu_cpp_wrapper 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_gpu_cpp_wrapper_1.1_3296d5e12ce15b37_.log 2025-09-07T07:57:42.5769786Z Running 294 items in this shard: test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_add_complex4_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_add_complex_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_adding_tensor_offsets_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_addmm_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_aoti_debug_printer_works_on_constants, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_as_strided_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_batch_norm_2d_2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_bernoulli1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_bitwise_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_bmm1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_bmm2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_buffer_use_after_remove_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_cat_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_cat_slice_cat_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_consecutive_split_cumprod_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_conv_backward_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_convolution1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_custom_op_1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_custom_op_2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_custom_op_3_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_bfloat16_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_fusion_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dynamic_shapes_persistent_reduction_mixed_x_dim_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_embedding_bag_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_enable_dynamic_shapes_cpp_wrapper_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_fft_real_input_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_fft_real_input_real_output_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_foreach_cpp_wrapper_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_index_put_deterministic_fallback_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_index_tensor_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_inductor_layout_optimization_input_mutations_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_insignificant_strides_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_layer_norm_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_linear1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_linear2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_linear_relu_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_mm_plus_mm2_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_mm_plus_mm3_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_mm_views_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_multi_device_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_multi_threading_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_pointwise_hermite_polynomial_h_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_pointwise_hermite_polynomial_he_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_pow3_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_profiler_mark_wrapper_call_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_randint_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_reduction1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_relu_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_repeat_interleave_2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_roi_align_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_scalar_input_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_scaled_dot_product_attention_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_scaled_dot_product_efficient_attention_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_silu_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_sort_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_sum_dtype_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_sum_int_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_transpose_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_bfloat16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_add_complex4_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_add_complex_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_adding_tensor_offsets_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_addmm_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_annotation_training, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_as_strided_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_batch_norm_2d_2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_bernoulli1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_bitwise_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_bmm1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_bmm2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_buffer_use_after_remove_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_cat_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_cat_slice_cat_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_consecutive_split_cumprod_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_conv_backward_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_convolution1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_custom_op_1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_custom_op_2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_custom_op_3_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_bfloat16_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_fusion_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dynamic_shapes_persistent_reduction_mixed_x_dim_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_embedding_bag_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_enable_dynamic_shapes_cpp_wrapper_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_fft_real_input_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_fft_real_input_real_output_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_foreach_cpp_wrapper_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_index_put_deterministic_fallback_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_index_tensor_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_inductor_layout_optimization_input_mutations_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_insignificant_strides_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_layer_norm_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_linear1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_linear2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_linear_relu_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_mm_plus_mm2_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_mm_plus_mm3_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_mm_views_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_multi_device_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_multi_threading_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_pointwise_hermite_polynomial_h_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_pointwise_hermite_polynomial_he_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_pow3_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_profiler_mark_wrapper_call_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_randint_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_reduction1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_relu_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_repeat_interleave_2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_roi_align_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_scalar_input_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_scaled_dot_product_attention_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_scaled_dot_product_efficient_attention_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_silu_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_sort_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_sum_dtype_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_sum_int_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_transpose_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_bfloat16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_uint8_cuda_dynamic_shapes_gpu_wrapper 2025-09-07T07:57:42.5822023Z 2025-09-07T07:57:42.5822127Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-09-07T07:57:42.5822306Z Uploading artifacts took 0.00 seconds 2025-09-07T07:57:42.5822477Z Running inductor/test_layout_optim 1/1 ... [2025-09-07 07:57:42.571383] 2025-09-07T07:57:42.5822637Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:57:42.5823021Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_layout_optim.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:57:42.571610] 2025-09-07T07:57:48.1844946Z 2025-09-07T07:57:48.1845773Z inductor/test_layout_optim 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_layout_optim_1.1_db1264f18496984f_.log 2025-09-07T07:57:48.1846190Z Running 0 items in this shard: 2025-09-07T07:57:48.1846271Z 2025-09-07T07:57:48.1846411Z Running inductor/test_torchinductor_codegen_dynamic_shapes 3/4 ... [2025-09-07 07:57:48.184156] 2025-09-07T07:57:48.1847220Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T07:57:48.1847832Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_torchinductor_codegen_dynamic_shapes.py', '--shard-id=3', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 07:57:48.184364] 2025-09-07T08:04:27.0811754Z 2025-09-07T08:04:27.0813524Z inductor/test_torchinductor_codegen_dynamic_shapes 3/4 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_codegen_dynamic_shapes_3.4_89c89024f82a9376_.log 2025-09-07T08:04:27.0911523Z Running 436 items in this shard: test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_adaptive_max_pool2d2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_adaptive_pool_errors_with_long_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex9_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_addmv_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aliased_buffer_reuse_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_allow_reuse_active_if_under_peak_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_arange3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_arange4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_arange5_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_argmax_argmin3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_argmax_argmin_with_duplicates_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_argmax_min_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool2d1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool2d3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool2d5_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool2d8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool3d_backward3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool3d_backward4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool3d_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool_errors_with_uint_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_baddbmm_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bernoulli1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bernoulli2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bfloat16_to_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bmm2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_both_scalars_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_computed_offsets_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_default_kwargs_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int32_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int64_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int64_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int8_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_uint8_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_uint8_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_nd_tiling_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_buffer_copied_in_graph_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_builtins_round_int_ndigits_pos_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_builtins_round_int_ndigits_zero_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_empty_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_unbacked_2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_chunk_recompiles_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_clamp_type_promotion_non_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_complex_memory_overlap_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_config_option_dont_assume_alignment_recompiles_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_consecutive_split_cumprod_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_constant_pad_fill_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_constant_pad_nd_inplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv2d_channels_last_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv3d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_convolution4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_copy_non_blocking_is_pinned_use_cat_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cos_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cumsum_no_mask_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cumsum_pattern_matcher_issue_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_custom_op_default_layout_constraint_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_custom_scan_would_split_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_deterministic_codegen_with_suffix_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_diagonal_copy_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dist_bf16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div7_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div_zero_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dropout_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dropout_trivial_1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtype_mismatch_issue_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_bfloat16_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_bfloat16_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float16_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float32_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float32_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float32_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int16_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int32_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int32_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int64_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int64_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int8_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int8_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int8_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int8_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int8_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_uint8_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_empty_strided_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_exact_stride_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fallback_mutable_op_basic_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fallback_mutable_op_list_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_flip_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_float32_to_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fractional_max_pool2d1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fractional_max_pool2d2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_full_like_transposed_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fuse_tiled_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_gather1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_gather_scatter_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_gelu_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_constant_tensor2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_mutation_real_name_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_no_inputs_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_unbacked_symint_as_output_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_grid_sampler_2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_hardtanh_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_horizonal_fusion1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_horizonal_fusion2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_propagation_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_propagation_flip_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put_as_masked_fill_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_select_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_indirect_load_broadcast_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inductor_multiple_specializations_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_input_mutation2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_input_mutation5_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_isin_tensor_scalar_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_l1_loss_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_large_grid_use_block_ptr_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_large_pointwise_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_lerp_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_like_channels_last_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_like_rands_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_linear1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_linear_dynamic_maxautotune_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_linspace1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_logaddexp_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_logsumexp_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_masked_fill_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_masked_scatter_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_matmul_layer_norm_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_min_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d5_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d7_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d_with_indices_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mixed_mm_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mul_index_expr_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_multi_gpu_device_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_multilayer_prime_size_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_multilayer_var_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_multilayer_var_lowp_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mutable_custom_op_fixed_layout_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_new_empty_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_no_mega_fusion_during_lowering_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_nonzero_unbacked_refinement_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_output_strides_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pattern_matcher_multi_user_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_permute2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_bessel_j1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_chebyshev_polynomial_t_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_chebyshev_polynomial_v_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_digamma_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_erfcx_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_erfinv_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_expit_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_gammainc_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_i1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_laguerre_polynomial_l_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_legendre_polynomial_p_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_log1p_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_modified_bessel_k0_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_psi_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_round_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_shifted_chebyshev_polynomial_u_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_xlog1py_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pow1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pow_by_natural_log2_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pow_symfloat_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_profiler_mark_wrapper_call_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_randint_kernel_count_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_randn_generator_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_randn_like_empty_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_reflection_pad2d_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_remove_noop_clone_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_remove_noop_copy_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_remove_noop_slice_scatter_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_repeat_interleave_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_repeat_interleave_Tensor_decomp_int32_nd_1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_repeat_interleave_Tensor_decomp_int64_nd_1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_reuse_buffers_with_aliasing_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scalar_output_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scaled_dot_product_attention_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter5_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter_add1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter_reduce1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter_reduce2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_searchsorted_broadcast_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_searchsorted_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sgn_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_shape_prop_torch_ones_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_silu_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_simplify_loops_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_single_elem_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sizehint_issue1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice_scatter4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice_scatter_dtype_consistency_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice_view_with_graph_break_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sort_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_special_polygamma_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_cumsum_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_reduction_dynamic_shape_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_with_integer_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sqrt_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sum_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sum_keepdims_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_tensor3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_tmp_not_defined_issue1_use_block_ptr_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_tmp_not_defined_issue2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_to_device_constant_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_to_memory_format_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_transpose_add_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unspec_inputs_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unspec_inputs_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unspec_inputs_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unspec_inputs_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unspec_inputs_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_upsample_bilinear2d_a_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_upsample_bilinear2d_b_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_upsample_cat_conv_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_upsample_nearest3d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_var_mean_div_by_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_vertical_fusion1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_view_as_real_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_view_uint8_through_differing_bitwidths_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_zero_dim_reductions_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_zeros_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test__dyn_quant_matmul_4bit_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_avg_pool1d_argmax_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_avg_pool2d2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_complex6_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_complex8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adding_tensor_offsets_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_addmv_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_allow_reuse_disable_if_exceed_peak_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_angle_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_aoti_eager_support_str_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_argmax_argmin2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_argmax_argmin_with_duplicates_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_argmax_argmin_with_nan_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_argmax_min_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_argmax_to_float_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_assert_alignment_op_name_pass_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool2d5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool2d7_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool2d_backward2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool3d_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool_errors_with_uint_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_baddbmm_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_batch_norm_2d_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bernoulli1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bitwise2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bitwise3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bool_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_add_autotune_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int16_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int32_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int32_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int32_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int32_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int64_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int64_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int8_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int8_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_uint8_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_nd_tiling_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_builtins_round_float_ndigits_neg_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_builtins_round_int_ndigits_pos_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cat_inplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cat_unbacked_empty_1d_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_clamp_type_promotion_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_complex_from_real_imag_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_complex_memory_overlap_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_config_option_dont_assume_alignment_cudagraphs_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_constant_pad_2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_conv2d_backward_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_conv3d_channels_last_use_block_ptr_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_conv_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_convolution5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cos_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cumsum_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cumsum_pattern_matcher_issue_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_op_1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_op_fixed_layout_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_div4_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_div5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_div_softmax_symfloat_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dropout2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dropout_deterministic_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtype_sympy_expr_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_bfloat16_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_bfloat16_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float32_bfloat16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float32_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float32_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float32_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float64_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float64_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int16_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int16_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int16_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int64_bfloat16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int64_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_embedding_bag_byte_unpack_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_erfc_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_expanded_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_expm1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fallback_mutable_op_basic_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_flip_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_forced_buffer_realize_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fractional_max_pool2d3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fractional_max_pool2d5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_full_boolean_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_full_like_sliced_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_full_like_transposed_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_full_truncation_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_functionalize_rng_wrappers_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fuse_tiled_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_both_scalars_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_constant_tensor1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_scalar_inputs_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_hardtanh_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_horizonal_fusion1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_propagation_nested_indirect_indexing_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put_as_masked_fill_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put_failed_reinplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put_reinplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_inductor_layout_optimization_input_mutations_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_inplace_activations_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_input_mutation3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_input_mutation5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_isin_tensor_scalar_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_isinf_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_layer_norm_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linspace3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_log1p_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_logcumsumexp_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_logcumsumexp_zero_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_long_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_masked_fill_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d_with_indices_backward4_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_min_max_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_mix_device_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_mm_views_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_mul_softmax_symfloat_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_multi_device_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_multi_gpu_recompile_on_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_multilayer_any_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_multilayer_sum_low_prec_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_multilayer_var_lowp_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_mutable_custom_op_fixed_layout_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_mutations_loop_fusion_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_nan_to_num_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_neg_max_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_no_mega_fusion_during_lowering_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_norm_constant_overflow_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_one_hot_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_output_strides_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pad_cast_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pad_single_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_permute2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_airy_ai_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_chebyshev_polynomial_u_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_erf_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_erfcx_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_exp2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_gammainc_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_laguerre_polynomial_l_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_modified_bessel_i1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_modified_bessel_k1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_shifted_chebyshev_polynomial_u_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_sinc_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_spherical_bessel_j0_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_xlogy_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_prod_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_reduction3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_reduction_config_limit_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_reflection_pad2d_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_reinterpret_dtypeview_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_remove_noop_copy_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_remove_noop_slice1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_remove_noop_slice_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_remove_noop_view_default_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_repeat_interleave_Tensor_decomp_int64_nd_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_repeat_interleave_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_replication_pad_errors_with_bool_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_roi_align_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_round_correctness_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_round_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scatter6_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scatter_add1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sdpa_unaligned_mask_freezing_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_setitem_with_int_parameter_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_shape_padding_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_should_pad_bench_for_bmm_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sign_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice4_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_mutation1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_scatter3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_scatter4_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sort_transpose_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_cumsum_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_reduction_dynamic_shape_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_with_sizes_with_unbacked_symints_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_with_unbacked_symints_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_strided_inputs_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sum_keepdims_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_tensor2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_tensor_index_put_slice_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_to_device_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_to_memory_format_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_topk_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_uint_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unbind_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unspec_inputs_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unspec_inputs_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_upsample_bicubic2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_upsample_cat_conv_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_vectorized_ops_masked_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_vertical_fusion1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_views5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_xblock_divides_xnumel_dynamic_shapes_cuda 2025-09-07T08:04:27.0993291Z 2025-09-07T08:04:27.0993400Z Running inductor/test_torchinductor_opinfo 1/9 ... [2025-09-07 08:04:27.081848] 2025-09-07T08:04:27.0993585Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:04:27.0994026Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=1', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:04:27.082051] 2025-09-07T08:14:10.6184621Z 2025-09-07T08:14:10.6186335Z inductor/test_torchinductor_opinfo 1/9 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_1.9_3a265d97f0f3a734_.log 2025-09-07T08:14:10.6255380Z Running 394 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_T_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___radd___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__chunk_cat_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__chunk_cat_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acosh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addbmm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addmv_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addr_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_any_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_any_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_arange_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_arange_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmax_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argsort_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argsort_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asinh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_2d_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bfloat16_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_and_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_or_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_block_diag_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cat_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cfloat_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chalf_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cholesky_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cholesky_inverse_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cholesky_inverse_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_max_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_min_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_copysign_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_copysign_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_corrcoef_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cosh_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cosh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumprod_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumulative_trapezoid_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_deg2rad_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_embed_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_embed_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_scatter_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_scatter_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diff_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dist_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_floor_rounding_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_permuted_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eq_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erf_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_as_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft2_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft2_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfftn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfftn_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfftn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flip_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_power_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_power_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmax_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmin_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gather_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ge_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gradient_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_grid_sampler_2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_grid_sampler_2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gt_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hash_tensor_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hstack_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_i0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_igammac_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_prod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_select_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_select_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isinf_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isinf_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isneginf_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isposinf_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_item_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kron_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kthvalue_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kthvalue_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kthvalue_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lcm_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lcm_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lerp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_eigvalsh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lstsq_grad_oriented_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_factor_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_factor_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_norm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_qr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vander_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vecdot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log10_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log1p_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_normal_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_with_dtype_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logcumsumexp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_and_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_and_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_xor_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mT_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumsum_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_normalize_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_scatter_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_select_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_softmin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_softmin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_std_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_binary_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_binary_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_no_dim_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_maximum_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_binary_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_binary_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_no_dim_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_with_dim_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mode_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mul_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mul_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mul_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mv_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nan_to_num_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmedian_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmedian_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nansum_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ne_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ne_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_neg_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_strided_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_full_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_full_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_ones_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_zeros_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_max_pool3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_bilinear_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cosine_similarity_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_dropout_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_elu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_embedding_bag_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_group_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_trilinear_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_kl_div_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_kl_div_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_leaky_relu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_pool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool2d_grad_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool3d_grad_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool3d_grad_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_multilabel_margin_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_normalize_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_circular_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_constant_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_prelu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu6_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_rms_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_rrelu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_scaled_dot_product_attention_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_silu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softplus_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softshrink_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_threshold_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_unfold_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_inf_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pca_lowrank_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pow_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rad2deg_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ravel_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_real_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_remainder_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_remainder_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize__cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize_as__cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_3_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_neg_3_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsqrt_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsub_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scalar_tensor_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_add_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_mean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sign_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_general_hamming_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_kaiser_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sin_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_softmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_softmax_with_dtype_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_airy_ai_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_erfcx_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_erfcx_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i0e_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_log_ndtr_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtr_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtri_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sqrt_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_unbiased_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sub_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_svd_lowrank_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tan_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tan_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensor_split_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensordot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tile_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_sparse_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_sparse_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_indices_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trunc_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_uint16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_split_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_split_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_mean_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_complex_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vstack_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_xlogy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_like_cuda_int32 2025-09-07T08:14:10.6317303Z 2025-09-07T08:14:10.6317439Z Running inductor/test_torchinductor_opinfo 7/9 ... [2025-09-07 08:14:10.618824] 2025-09-07T08:14:10.6317623Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:14:10.6318033Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=7', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:14:10.619021] 2025-09-07T08:22:57.1342382Z 2025-09-07T08:22:57.1343732Z inductor/test_torchinductor_opinfo 7/9 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_7.9_a29a41d53fac4367_.log 2025-09-07T08:22:57.1414000Z Running 391 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_T_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___getitem___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___getitem___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___radd___cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___radd___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmod___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rsub___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rxor___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__batch_norm_with_update_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__upsample_bilinear2d_aa_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acos_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argsort_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_partial_views_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asinh_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asinh_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atanh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_baddbmm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bernoulli_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_xor_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_xor_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bmm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bmm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bucketize_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bucketize_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cartesian_prod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cat_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cat_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdouble_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ceil_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cfloat_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chunk_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chunk_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_max_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_min_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_complex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_physical_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_physical_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_physical_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_contiguous_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_copysign_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_corrcoef_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cross_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummax_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumsum_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumulative_trapezoid_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumulative_trapezoid_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_deg2rad_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagflat_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_scatter_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dist_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_trunc_rounding_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_trunc_rounding_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_double_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_strided_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erf_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erfinv_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_as_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expm1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft2_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfftn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flatten_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flatten_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flip_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fliplr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flipud_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_power_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_power_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmod_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_frac_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_uint16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gcd_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ge_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ge_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_geometric_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_geometric_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_geqrf_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hash_tensor_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_heaviside_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hsplit_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hstack_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hstack_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_igammac_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_fill_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_fill_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_put_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_put_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_prod_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isfinite_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isnan_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isnan_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isneginf_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isneginf_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_item_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_item_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_le_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_le_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_le_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lerp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lgamma_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cholesky_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_inv_ex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lstsq_grad_oriented_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_solve_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_singular_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_solve_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_solve_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_svd_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_svdvals_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vander_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vander_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vecdot_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vecdot_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_tensor_overload_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_tensor_overload_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_normal_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_and_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_xor_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_xor_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mH_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumprod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumsum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumsum_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_fill_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_fill_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_logaddexp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_norm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_prod_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_scatter_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_scatter_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_std_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_pool2d_with_indices_backward_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_with_dim_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_with_dim_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_median_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_median_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_binary_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_minimum_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mode_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nan_to_num_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmedian_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanquantile_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_dropout_backward_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_neg_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_strided_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_zeros_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_zeros_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_alpha_dropout_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_celu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_dropout3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_elu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_gelu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardsigmoid_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardtanh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_huber_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_bicubic_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_bicubic_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_trilinear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_layer_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_leaky_relu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_mish_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_multilabel_margin_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_circular_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_constant_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_constant_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu6_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_rrelu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_selu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_silu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_soft_margin_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softmin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softmin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_unfold_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_static_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_outer_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_outer_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_positive_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_positive_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_qr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rad2deg_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_like_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ravel_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_real_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_remainder_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_as_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize__cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize_as__cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rot90_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rot90_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_add_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amax_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_mean_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_mean_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_prod_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_prod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_sum_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_sum_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_scatter_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sigmoid_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_bartlett_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_cosine_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signbit_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinc_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_softmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_softmax_with_dtype_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_airy_ai_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j1_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i0_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtr_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtri_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_mean_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sub_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_svd_lowrank_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensordot_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tile_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_sparse_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trace_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapezoid_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapz_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_triangular_solve_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_indices_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_true_divide_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_true_divide_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trunc_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unravel_index_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vdot_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vdot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vsplit_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_where_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_cuda_int64 2025-09-07T08:22:57.1474868Z 2025-09-07T08:22:57.1474971Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-09-07T08:22:57.1475159Z Uploading artifacts took 0.00 seconds 2025-09-07T08:22:57.1475326Z Running inductor/test_xpu_basic 1/1 ... [2025-09-07 08:22:57.134345] 2025-09-07T08:22:57.1481193Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:22:57.1481588Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_xpu_basic.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:22:57.134615] 2025-09-07T08:23:03.3322116Z 2025-09-07T08:23:03.3323366Z inductor/test_xpu_basic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_xpu_basic_1.1_5d8e72ec5d8c9c4a_.log 2025-09-07T08:23:03.3324093Z 2025-09-07T08:23:03.3324377Z Running nn/test_pooling 1/1 ... [2025-09-07 08:23:03.332067] 2025-09-07T08:23:03.3326917Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:23:03.3327596Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'nn/test_pooling.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:23:03.332407] 2025-09-07T08:23:36.9345891Z 2025-09-07T08:23:36.9356159Z nn/test_pooling 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_pooling_1.1_e935809bb1cc16b4_.log 2025-09-07T08:23:36.9374953Z Running 143 items in this shard: test/nn/test_pooling.py::TestAvgPool::test_avg_pool1d_ceil_mode, test/nn/test_pooling.py::TestAvgPool::test_avg_pool2d_ceil_mode, test/nn/test_pooling.py::TestAvgPool::test_avg_pool3d_ceil_mode, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool2d, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool2d_with_divisor, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool3d, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool3d_with_divisor, test/nn/test_pooling.py::TestPoolingNN::test_MaxUnpool2d_output_size, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_avg_pooling_nhwc_overflow, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_avg_pooling_overflow, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc_launch_config_backward, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc_launch_config_forward, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc_non_contiguous, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_lower_precision, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_size_none, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_size_overflow, test/nn/test_pooling.py::TestPoolingNN::test_max_unpool, test/nn/test_pooling.py::TestPoolingNN::test_max_unpool2d_nhwc_cpu, test/nn/test_pooling.py::TestPoolingNN::test_max_unpool3d_input_check, test/nn/test_pooling.py::TestPoolingNN::test_quantized_max_pool1d_empty_kernel, test/nn/test_pooling.py::TestPoolingNN::test_quantized_max_pool3d, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool_zero_batch_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AvgPool2d_empty_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AvgPool3d_backward_after_cat_dim1_device_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool2d_zero_batch_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool2d_zero_out_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool2d_zero_samples_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_errors_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_zero_batch_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_zero_out_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_zero_samples_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool_zero_batch_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case10_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case1_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case2_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case3_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case4_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case5_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case6_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case7_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case8_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case9_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_zero_batch_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_avg_pool2d_output_size_one_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_avg_pool3d_output_size_one_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_avg_pooling_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_max_pooling_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pool_odd_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_max_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_max_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int8, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_uint8, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_zero_batch_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_zero_batch_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_nhwc_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_reduced_floating_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_reduced_floating_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool2d_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool2d_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool3d_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool_nan_inf_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool_nan_inf_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool_nan_inf_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_corner_cases_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_corner_cases_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_corner_cases_cuda_int32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_corner_cases_cuda_int64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_indices_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_nhwc_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_with_indices_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool3d_ndhwc_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool3d_ndhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool3d_ndhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_bfloat16_half_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_bfloat16_half_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_nan_inf_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_nan_inf_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_nan_inf_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool3d_non_square_backward_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool3d_large_size_int64_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool3d_size_one_feature_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_bfloat16_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_large_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_max_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_max_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_avg_pooling_dims_1_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_avg_pooling_dims_2_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_avg_pooling_dims_3_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_max_pooling_dims_1_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_max_pooling_dims_2_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_max_pooling_dims_3_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_zero_stride_cuda 2025-09-07T08:23:36.9392809Z 2025-09-07T08:23:36.9392896Z Running profiler/test_profiler_tree 1/1 ... [2025-09-07 08:23:36.934788] 2025-09-07T08:23:36.9393071Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:23:36.9393463Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'profiler/test_profiler_tree.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:23:36.934997] 2025-09-07T08:23:39.2042056Z 2025-09-07T08:23:39.2043837Z profiler/test_profiler_tree 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_profiler_tree_1.1_4b2c285b3122d71b_.log 2025-09-07T08:23:39.2047231Z Running 10 items in this shard: test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_cuda, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_cuda_detailed, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_cuda_with_stream, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_memory, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_memory_and_stack, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_record_function, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_stack_and_modules, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_stack_and_torch_dispatch, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_stack_and_torch_function 2025-09-07T08:23:39.2049899Z 2025-09-07T08:23:39.2050043Z Running profiler/test_python_tracer 1/1 ... [2025-09-07 08:23:39.203927] 2025-09-07T08:23:39.2050303Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:23:39.2050907Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'profiler/test_python_tracer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:23:39.204174] 2025-09-07T08:23:46.0830847Z 2025-09-07T08:23:46.0842031Z profiler/test_python_tracer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_python_tracer_1.1_c49ad684e6059ca1_.log 2025-09-07T08:23:46.0843680Z Running 3 items in this shard: test/profiler/test_python_tracer.py::TestPythonTracer::test_method_with_c_function, test/profiler/test_python_tracer.py::TestPythonTracer::test_monitoring_callback, test/profiler/test_python_tracer.py::TestPythonTracer::test_unexpected_c_return_events 2025-09-07T08:23:46.0844961Z 2025-09-07T08:23:46.0845198Z Running profiler/test_record_function 1/1 ... [2025-09-07 08:23:46.082859] 2025-09-07T08:23:46.0845501Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:23:46.0846157Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'profiler/test_record_function.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:23:46.083180] 2025-09-07T08:23:48.1524484Z 2025-09-07T08:23:48.1526473Z profiler/test_record_function 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_record_function_1.1_49437f441503d29e_.log 2025-09-07T08:23:48.1528992Z Running 4 items in this shard: test/profiler/test_record_function.py::TestRecordFunction::test_datapipe_delegation_with_profiler, test/profiler/test_record_function.py::TestRecordFunction::test_datapipe_with_record_function, test/profiler/test_record_function.py::TestRecordFunction::test_datapipe_with_record_function_fork, test/profiler/test_record_function.py::TestRecordFunction::test_record_function 2025-09-07T08:23:48.1530630Z 2025-09-07T08:23:48.1530894Z Running profiler/test_torch_tidy 1/1 ... [2025-09-07 08:23:48.152413] 2025-09-07T08:23:48.1531347Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:23:48.1540051Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'profiler/test_torch_tidy.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:23:48.152717] 2025-09-07T08:23:54.5305871Z 2025-09-07T08:23:54.5308292Z profiler/test_torch_tidy 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_torch_tidy_1.1_e10277572ed7b8cb_.log 2025-09-07T08:23:54.5315880Z Running 22 items in this shard: test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocation_id_uniqueness, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocation_ids, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocation_ids_with_other_ops, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocations, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_extra_fields, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_impl_reuse, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_mkldnn_tensors, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_module_and_optimizer_ids, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_nnmodule_params, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_optimizer, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_optimizer_parameters_adam, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_optimizer_parameters_sgd, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_pointers_and_ids, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_refcounts, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_scalar_ins, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_sparse_tensors, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensor_lists, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensor_properties, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_full, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_keep_alive, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_scalar_args, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_set 2025-09-07T08:23:54.5319452Z 2025-09-07T08:23:54.5319548Z Running test_accelerator 1/1 ... [2025-09-07 08:23:54.530530] 2025-09-07T08:23:54.5319753Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:23:54.5320240Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_accelerator.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:23:54.530777] 2025-09-07T08:23:57.0007000Z 2025-09-07T08:23:57.0008607Z test_accelerator 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_accelerator_1.1_bd89ccfeb80a5d6b_.log 2025-09-07T08:23:57.0012341Z Running 11 items in this shard: test/test_accelerator.py::TestAccelerator::test_current_accelerator, test/test_accelerator.py::TestAccelerator::test_current_stream_query, test/test_accelerator.py::TestAccelerator::test_device_context_manager, test/test_accelerator.py::TestAccelerator::test_generic_event_behavior, test/test_accelerator.py::TestAccelerator::test_generic_multi_device_behavior, test/test_accelerator.py::TestAccelerator::test_generic_stream_behavior, test/test_accelerator.py::TestAccelerator::test_memory_stats, test/test_accelerator.py::TestAccelerator::test_multi_device_context_manager, test/test_accelerator.py::TestAccelerator::test_multi_device_stream_context_manager, test/test_accelerator.py::TestAccelerator::test_pin_memory_on_non_blocking_copy, test/test_accelerator.py::TestAccelerator::test_stream_context_manager 2025-09-07T08:23:57.0014674Z 2025-09-07T08:23:57.0014754Z Running test_autocast 1/1 ... [2025-09-07 08:23:57.000707] 2025-09-07T08:23:57.0014917Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:23:57.0015287Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_autocast.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:23:57.000926] 2025-09-07T08:24:04.5332333Z 2025-09-07T08:24:04.5341559Z test_autocast 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_autocast_1.1_9029d268d513a469_.log 2025-09-07T08:24:04.5345093Z Running 20 items in this shard: test/test_autocast.py::TestAutocastCPU::test_autocast_disabled_with_fp32_dtype, test/test_autocast.py::TestAutocastCPU::test_autocast_methods_expect_builtin_promote, test/test_autocast.py::TestAutocastCPU::test_autocast_nn_16, test/test_autocast.py::TestAutocastCPU::test_autocast_nn_fp32, test/test_autocast.py::TestAutocastCPU::test_autocast_rnn, test/test_autocast.py::TestAutocastCPU::test_autocast_torch_16, test/test_autocast.py::TestAutocastCPU::test_autocast_torch_expect_builtin_promote, test/test_autocast.py::TestAutocastCPU::test_autocast_torch_fp32, test/test_autocast.py::TestAutocastCPU::test_autocast_torch_need_autocast_promote, test/test_autocast.py::TestAutocastCPU::test_cpu_autocast_deprecated_warning, test/test_autocast.py::TestAutocastCPU::test_generic_autocast, test/test_autocast.py::TestAutocastGPU::test_autocast_prioritize, test/test_autocast.py::TestAutocastGPU::test_cache_disabled, test/test_autocast.py::TestAutocastGPU::test_cast_cache_is_global, test/test_autocast.py::TestAutocastMPS::test_cast_cache_is_global, test/test_autocast.py::TestAutocastMPS::test_mps_autocast_bfloat16_supported, test/test_autocast.py::TestAutocastMPS::test_mps_autocast_error_message, test/test_autocast.py::TestTorchAutocast::test_autocast_fast_dtype, test/test_autocast.py::TestTorchAutocast::test_invalid_device, test/test_autocast.py::TestTorchAutocast::test_non_string_device 2025-09-07T08:24:04.5347681Z 2025-09-07T08:24:04.5347784Z Running test_autograd_fallback 1/1 ... [2025-09-07 08:24:04.533114] 2025-09-07T08:24:04.5347989Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:24:04.5348477Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_autograd_fallback.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:24:04.533396] 2025-09-07T08:24:06.5525419Z 2025-09-07T08:24:06.5526786Z test_autograd_fallback 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_autograd_fallback_1.1_1c5b3116d97f4cf7_.log 2025-09-07T08:24:06.5536020Z Running 28 items in this shard: test/test_autograd_fallback.py::TestAutogradFallback::test_autograd_function_registered_to_cpu_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_autograd_function_registered_to_cpu_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_base_does_not_require_grad_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_base_does_not_require_grad_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_composite_registered_to_cpu_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_composite_registered_to_cpu_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_cpu_return_self_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_cpu_return_self_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_inplace_autograd_function_registered_to_cpu_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_inplace_autograd_function_registered_to_cpu_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_inplace_on_tensor_that_does_not_require_grad_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_inplace_on_tensor_that_does_not_require_grad_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_no_autograd_kernel_inplace_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_no_autograd_kernel_inplace_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_no_autograd_kernel_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_no_autograd_kernel_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_no_grad_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_no_grad_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_post_autograd_returns_leaf_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_post_autograd_returns_leaf_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_post_autograd_returns_mix_of_requires_grad_tensors_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_post_autograd_returns_mix_of_requires_grad_tensors_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_supports_tensor_lists_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_supports_tensor_lists_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_undefined_grads_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_undefined_grads_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_undefined_inputs_outputs_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_undefined_inputs_outputs_mode_warn 2025-09-07T08:24:06.5547283Z 2025-09-07T08:24:06.5547372Z Running test_autoload 1/1 ... [2025-09-07 08:24:06.552485] 2025-09-07T08:24:06.5547537Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:24:06.5547919Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_autoload.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:24:06.552728] 2025-09-07T08:24:08.5713548Z 2025-09-07T08:24:08.5714697Z test_autoload 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_autoload_1.1_bb37349a4c7289f5_.log 2025-09-07T08:24:08.5715703Z Running 1 items in this shard: test/test_autoload.py::TestDeviceBackendAutoload::test_autoload 2025-09-07T08:24:08.5716124Z 2025-09-07T08:24:08.5717037Z Running test_binary_ufuncs 1/1 ... [2025-09-07 08:24:08.571380] 2025-09-07T08:24:08.5717591Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:24:08.5719082Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_binary_ufuncs.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:24:08.571628] 2025-09-07T08:26:47.4239327Z 2025-09-07T08:26:47.4240155Z test_binary_ufuncs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_binary_ufuncs_1.1_1a7894174083bdba_.log 2025-09-07T08:26:47.6257234Z Running 12857 items in this shard: test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_add_broadcast_empty_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_add_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_add_with_tail_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_addcmul_scalars_as_floats_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_addsub_half_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_atan2_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_atan2_edgecases_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_binary_op_mem_overlap_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_binary_op_scalar_device_unspecified_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_binary_ops_with_scalars_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bool_tensor_comparison_ops_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cdiv_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cmul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_div_underflow_overflow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_div_underflow_overflow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cpow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cpu_tensor_pow_cuda_scalar_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cremainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cross_device_binary_ops_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cross_device_inplace_error_msg_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_csub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cuda_tensor_pow_scalar_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cumulative_trapezoid_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_and_floordiv_script_vs_python_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_and_floordiv_vs_python_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divmul_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_exceptions_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_scalar_pow_float_tensor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_scalar_pow_float_tensor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_float_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_float_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_float_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cross_device_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_idiv_and_ifloordiv_vs_python_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_inplace_division_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_inplace_dunders_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_int_and_float_pow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_int_tensor_pow_neg_ints_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_ldexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cpu_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cpu_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_tensor_promotion_error_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_tensor_promotion_error_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_tensor_promotion_error_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_with_nontrivial_alignment_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_long_tensor_pow_floats_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_cross_device_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_forward_ad_float32_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_min_max_binary_op_nan_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_min_max_binary_op_nan_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_min_max_binary_op_nan_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_chalf_tensor_and_cpu_scalar_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_intertype_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_intertype_scalar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_intertype_scalar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_nextafter_bfloat16_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_out_resize_warning_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex_extremal_passing_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex_extremal_passing_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_inplace_resizing_exception_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_scalar_base_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_scalar_overloads_mem_overlap_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_scalar_type_promotion_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_remainder_fmod_large_dividend_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_remainder_fmod_large_dividend_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_remainder_overflow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rpow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_typing_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_tensor_pow_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_trapezoid_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_true_divide_out_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_true_divide_out_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___radd___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rand___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rdiv___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rmod___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rmul___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___ror___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rpow___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rsub___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rxor___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs__conversions_complex_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs__conversions_polar_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_add_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_atan2_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_left_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_right_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_clamp_max_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_clamp_min_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_copysign_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_div_floor_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_div_no_rounding_mode_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_div_trunc_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_eq_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_float_power_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_floor_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_fmax_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_fmin_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_fmod_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_gcd_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_ge_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_gt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_heaviside_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_hypot_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_igamma_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_igammac_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_isclose_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_lcm_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_le_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logaddexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logical_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logical_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logical_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_lt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_maximum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_minimum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_mul_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_ne_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_nextafter_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_pow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_remainder_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_rsub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_special_xlog1py_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_special_zeta_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_sub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_true_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_xlogy_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_add_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_atan2_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_left_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_right_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_clamp_max_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_clamp_min_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_complex_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_copysign_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_div_floor_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_div_no_rounding_mode_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_div_trunc_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_eq_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_float_power_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_floor_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_fmax_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_fmin_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_fmod_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_gcd_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_ge_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_gt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_heaviside_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_hypot_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_igamma_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_igammac_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_isclose_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_jiterator_binary_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_jiterator_binary_return_by_ref_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_lcm_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_ldexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_le_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logaddexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logical_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logical_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logical_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_lt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_max_binary_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_maximum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_min_binary_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_minimum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_mul_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_ne_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_nextafter_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_polar_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_pow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_remainder_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_rsub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_t_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_u_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_v_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_w_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_hermite_polynomial_h_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_hermite_polynomial_he_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_laguerre_polynomial_l_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_legendre_polynomial_p_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_t_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_u_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_v_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_w_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_xlog1py_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_zeta_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_sub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_true_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_xlogy_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_bfloat16_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_gradients_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_scalar_type_promotion_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_uint8 2025-09-07T08:26:47.8140979Z 2025-09-07T08:26:47.8141103Z Running test_ci_sanity_check_fail 1/1 ... [2025-09-07 08:26:47.434200] 2025-09-07T08:26:47.8141282Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:26:47.8147781Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_ci_sanity_check_fail.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:26:47.434461] 2025-09-07T08:26:55.6535799Z Running test_decomp 2/12 ... [2025-09-07 08:26:55.652929] 2025-09-07T08:26:55.6536072Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:26:55.6544323Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_decomp.py', '--shard-id=2', '--num-shards=12', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:26:55.653184] 2025-09-07T08:37:21.2986227Z 2025-09-07T08:37:21.2987199Z test_decomp 2/12 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_2.12_5c7cf42e02c1d5d7_.log 2025-09-07T08:37:21.3077264Z Running 769 items in this shard: test/test_decomp.py::TestDecompCUDA::test_arange_graph_cuda, test/test_decomp.py::TestDecompCUDA::test_bernoulli_default_cuda, test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmatmul___cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rxor___cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcdiv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_decomposed_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_allclose_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_allclose_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argsort_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_3d_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_baddbmm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_not_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_or_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bucketize_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cauchy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chalf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chalf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_solve_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_solve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumulative_trapezoid_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumulative_trapezoid_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dist_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_floor_rounding_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dot_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_float8_e4m3fn, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfftn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gradient_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gradient_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gradient_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_histc_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_histc_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_put_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_prod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_int_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_int_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_le_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cond_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lstsq_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_solve_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_solve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_norm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_rank_hermitian_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_singular_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_qr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_slogdet_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svd_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svd_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_normal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logcumsumexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logit_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumprod_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_softmin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matmul_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matmul_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_binary_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_pool2d_with_indices_backward_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_binary_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_no_dim_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmedian_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanquantile_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_dropout_backward_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose2d_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_similarity_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_ctc_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_elu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_embedding_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_with_train_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_group_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_area_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_layer_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_leaky_relu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool1d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multilabel_margin_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_nll_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_prelu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_selu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_selu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_silu_complex_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softplus_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softshrink_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_unfold_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_in_place_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pinverse_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polar_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_qr_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rand_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_conj_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sign_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sign_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_bartlett_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_general_hamming_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signbit_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sparse_sampled_addmm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j0_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j1_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_erfcx_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_erfcx_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1e_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtri_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_unbiased_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sub_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__flash_attention_forward_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_indices_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_indices_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trunc_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unravel_index_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_unbiased_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick__upsample_bilinear2d_aa_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_decomposed_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_and_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_or_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_dot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_logsumexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_norm_nuc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_squeeze_multiple_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_tril_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_cumsum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_deg2rad_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_digamma_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_dist_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_dist_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_exp2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_exp2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_expand_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_expand_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_float8_e4m3fnuz, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_float8_e5m2fnuz, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft2_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_floor_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_frac_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_frac_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_ge_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_ge_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_hypot_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_isposinf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_isposinf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_lgamma_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_log2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_log2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_not_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_logical_not_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_minimum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_minimum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nan_to_num_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_native_batch_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_native_dropout_backward_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_native_layer_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_new_zeros_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_new_zeros_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_elu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_embedding_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardsigmoid_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardswish_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardtanh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool2d_grad_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_prelu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_prelu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_softshrink_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_norm_inf_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_norm_nuc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_normal_number_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_permute_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_remainder_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_round_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_rsub_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_signbit_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_signbit_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_softmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_erfcx_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_i0e_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtr_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_zeta_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_special_zeta_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_zeta_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_sqrt_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_std_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_unbiased_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_std_unbiased_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_unsafe_split_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_unbiased_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_vdot_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_vdot_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_zero__cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_uniform_cuda, test/test_decomp.py::DecompOneOffTestsCUDA::test_native_layer_norm_cpu_decomp_cuda, test/test_decomp.py::HasDecompTest::test_has_decomposition 2025-09-07T08:37:21.3155953Z 2025-09-07T08:37:21.3156029Z Running test_decomp 8/12 ... [2025-09-07 08:37:21.299284] 2025-09-07T08:37:21.3156179Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:37:21.3156619Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_decomp.py', '--shard-id=8', '--num-shards=12', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:37:21.299507] 2025-09-07T08:44:46.7213445Z 2025-09-07T08:44:46.7215061Z test_decomp 8/12 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_8.12_dcbf53cab318e3e3_.log 2025-09-07T08:44:46.7291977Z Running 706 items in this shard: test/test_decomp.py::TestDecompCUDA::test_broadcasting_index_copy_cuda, test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___ror___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___ror___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__batch_norm_with_update_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive__softmax_backward_data_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_3d_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_right_shift_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_xor_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_xor_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bmm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bmm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bucketize_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cauchy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cauchy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chalf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chalf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumulative_trapezoid_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dist_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dist_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_einsum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_like_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_float8_e5m2fnuz, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gcd_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ge_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geqrf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_grid_sampler_2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_grid_sampler_3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hash_tensor_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_heaviside_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_heaviside_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_histc_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_igamma_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_put_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kthvalue_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_le_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lerp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lerp_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigvalsh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lstsq_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lstsq_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_power_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_rank_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_multi_dot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_slogdet_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_ex_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_triangular_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logaddexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logcumsumexp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_unpack_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumprod_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumprod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matrix_exp_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matrix_exp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_median_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_binary_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmedian_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_dropout_backward_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_alpha_dropout_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_bilinear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv2d_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_elu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_embedding_bag_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_embedding_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_gelu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardshrink_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardshrink_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardswish_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_trilinear_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_trilinear_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool3d_grad_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_normalize_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_soft_margin_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_fro_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_number_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ormqr_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pinverse_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rand_like_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_searchsorted_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sign_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_log_ndtr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i0_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_unbiased_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__efficient_attention_forward_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_uniform_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unravel_index_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unravel_index_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zero__cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick__native_batch_norm_legit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_decomposed_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_addmv_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_amax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_amin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_amin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_baddbmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_left_shift_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_cauchy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_complex_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_frac_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_logit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_unbind_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_cumsum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_cumsum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_digamma_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_exp2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_expand_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_expand_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_floor_divide_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_gcd_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_ge_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_heaviside_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_heaviside_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_igammac_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_lcm_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_lerp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_lgamma_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_tensor_overload_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_log2_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_minimum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_minimum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_native_batch_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_nextafter_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_gelu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_gelu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_glu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_glu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_softshrink_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_unfold_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_normal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_permute_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_permute_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_remainder_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_rsub_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_rsub_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_log_ndtr_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_uniform_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_unsafe_split_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unsafe_split_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_var_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_zero__cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_RNN_train_mode_cuda_float32 2025-09-07T08:44:46.7363894Z 2025-09-07T08:44:46.7364034Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-09-07T08:44:46.7364245Z Uploading artifacts took 0.00 seconds 2025-09-07T08:44:46.7364402Z Running test_function_schema 1/1 ... [2025-09-07 08:44:46.721867] 2025-09-07T08:44:46.7370619Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:44:46.7371020Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_function_schema.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:44:46.722090] 2025-09-07T08:44:48.8411534Z 2025-09-07T08:44:48.8412756Z test_function_schema 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_function_schema_1.1_909a45157dac8aed_.log 2025-09-07T08:44:48.8418838Z Running 15 items in this shard: test/test_function_schema.py::TestFunctionSchema::test_backward_compatible_arguments, test/test_function_schema.py::TestFunctionSchema::test_backward_compatible_outputs, test/test_function_schema.py::TestFunctionSchema::test_backward_compatible_structure, test/test_function_schema.py::TestFunctionSchema::test_backward_compatible_with_smart_serialization, test/test_function_schema.py::TestFunctionSchema::test_forward_compatible_arguments_real_use_case, test/test_function_schema.py::TestFunctionSchema::test_forward_compatible_arguments_with_out, test/test_function_schema.py::TestFunctionSchema::test_forward_compatible_arguments_without_out, test/test_function_schema.py::TestFunctionSchema::test_hash_schema, test/test_function_schema.py::TestFunctionSchema::test_out_schema, test/test_function_schema.py::TestFunctionSchema::test_schema_error, test/test_function_schema.py::TestFunctionSchema::test_serialize_and_deserialize, test/test_function_schema.py::TestFunctionSchema::test_string_optional_parameter_default_value, test/test_function_schema.py::TestFunctionSchema::test_sym_int_argument_properly_parsed, test/test_function_schema.py::TestFunctionSchema::test_tensor_list_alias_annotation_properly_parsed, test/test_function_schema.py::TestFunctionSchema::test_tensor_option_arguments_properly_parsed 2025-09-07T08:44:48.8423244Z 2025-09-07T08:44:48.8423488Z Running test_functional_autograd_benchmark 1/1 ... [2025-09-07 08:44:48.841186] 2025-09-07T08:44:48.8423899Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:44:48.8424836Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_functional_autograd_benchmark.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:44:48.841492] 2025-09-07T08:45:08.2432242Z 2025-09-07T08:45:08.2433479Z test_functional_autograd_benchmark 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_functional_autograd_benchmark_1.1_3b70a536e5198e83_.log 2025-09-07T08:45:08.2435188Z Running 2 items in this shard: test/test_functional_autograd_benchmark.py::TestFunctionalAutogradBenchmark::test_fast_tasks, test/test_functional_autograd_benchmark.py::TestFunctionalAutogradBenchmark::test_slow_tasks 2025-09-07T08:45:08.2436087Z 2025-09-07T08:45:08.2436325Z Running test_functional_optim 1/1 ... [2025-09-07 08:45:08.243079] 2025-09-07T08:45:08.2436933Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:45:08.2437855Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_functional_optim.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:45:08.243364] 2025-09-07T08:45:10.4127207Z 2025-09-07T08:45:10.4128508Z test_functional_optim 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_functional_optim_1.1_963bba2e0f7148c8_.log 2025-09-07T08:45:10.4130506Z Running 4 items in this shard: test/test_functional_optim.py::TestFunctionalOptimParity::test_functional_optim_parity_adam, test/test_functional_optim.py::TestFunctionalOptimParity::test_functional_optim_parity_adam_w, test/test_functional_optim.py::TestFunctionalOptimParity::test_functional_optim_parity_sgd, test/test_functional_optim.py::TestFunctionalOptimParity::test_functional_optim_registration 2025-09-07T08:45:10.4131631Z 2025-09-07T08:45:10.4131800Z Running test_functionalization 1/1 ... [2025-09-07 08:45:10.412665] 2025-09-07T08:45:10.4132217Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:45:10.4132914Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_functionalization.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:45:10.412883] 2025-09-07T08:45:17.5543066Z 2025-09-07T08:45:17.5553808Z test_functionalization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_functionalization_1.1_4554425af5d58be7_.log 2025-09-07T08:45:17.5570506Z Running 112 items in this shard: test/test_functionalization.py::TestFunctionalization::test_advanced_indexing, test/test_functionalization.py::TestFunctionalization::test_advanced_indexing_correct_strides, test/test_functionalization.py::TestFunctionalization::test_aliases_maintained_after_pass_when_reapplying_views, test/test_functionalization.py::TestFunctionalization::test_as_strided, test/test_functionalization.py::TestFunctionalization::test_batch_norm, test/test_functionalization.py::TestFunctionalization::test_cat, test/test_functionalization.py::TestFunctionalization::test_channels_last_contiguous, test/test_functionalization.py::TestFunctionalization::test_copy_, test/test_functionalization.py::TestFunctionalization::test_copy_stride_mismatch, test/test_functionalization.py::TestFunctionalization::test_diagonal, test/test_functionalization.py::TestFunctionalization::test_diagonal_mutated_input, test/test_functionalization.py::TestFunctionalization::test_everything, test/test_functionalization.py::TestFunctionalization::test_expand_symint, test/test_functionalization.py::TestFunctionalization::test_fill_, test/test_functionalization.py::TestFunctionalization::test_freeze, test/test_functionalization.py::TestFunctionalization::test_index_mutation_on_non_input, test/test_functionalization.py::TestFunctionalization::test_inplace_on_non_view, test/test_functionalization.py::TestFunctionalization::test_instance_norm, test/test_functionalization.py::TestFunctionalization::test_metadata_change, test/test_functionalization.py::TestFunctionalization::test_metadata_change_out_op, test/test_functionalization.py::TestFunctionalization::test_mixed_wrappers_invalid, test/test_functionalization.py::TestFunctionalization::test_mixed_wrappers_valid, test/test_functionalization.py::TestFunctionalization::test_multi_out, test/test_functionalization.py::TestFunctionalization::test_multiple_views_of_same_base, test/test_functionalization.py::TestFunctionalization::test_mutable_op_not_inplace_or_other, test/test_functionalization.py::TestFunctionalization::test_mutation_overlapping_mem, test/test_functionalization.py::TestFunctionalization::test_nested_functions_propagate_updates, test/test_functionalization.py::TestFunctionalization::test_only_one_view, test/test_functionalization.py::TestFunctionalization::test_optional_tensor_list, test/test_functionalization.py::TestFunctionalization::test_python_functionalization, test/test_functionalization.py::TestFunctionalization::test_python_functionalization_conj, test/test_functionalization.py::TestFunctionalization::test_python_functionalization_is_conj, test/test_functionalization.py::TestFunctionalization::test_python_functionalization_is_neg, test/test_functionalization.py::TestFunctionalization::test_python_functionalization_lift_fresh, test/test_functionalization.py::TestFunctionalization::test_python_functionalization_lift_fresh_storage, test/test_functionalization.py::TestFunctionalization::test_python_functionalization_neg, test/test_functionalization.py::TestFunctionalization::test_python_functionalization_zero_tensor, test/test_functionalization.py::TestFunctionalization::test_reapply_views_simple, test/test_functionalization.py::TestFunctionalization::test_resize_larger_invalid, test/test_functionalization.py::TestFunctionalization::test_resize_larger_valid, test/test_functionalization.py::TestFunctionalization::test_resize_same_size_diff_rank, test/test_functionalization.py::TestFunctionalization::test_resize_smaller, test/test_functionalization.py::TestFunctionalization::test_save_for_backwards_segfault, test/test_functionalization.py::TestFunctionalization::test_scalars, test/test_functionalization.py::TestFunctionalization::test_set_, test/test_functionalization.py::TestFunctionalization::test_simple, test/test_functionalization.py::TestFunctionalization::test_simple_out, test/test_functionalization.py::TestFunctionalization::test_slice, test/test_functionalization.py::TestFunctionalization::test_split, test/test_functionalization.py::TestFunctionalization::test_split_with_sizes, test/test_functionalization.py::TestFunctionalization::test_tensor_ctr, test/test_functionalization.py::TestFunctionalization::test_tensor_list_composite, test/test_functionalization.py::TestFunctionalization::test_tensor_list_mixed_functional_nonfunctional, test/test_functionalization.py::TestFunctionalization::test_unbind, test/test_functionalization.py::TestFunctionalization::test_view_clone_view_inplace, test/test_functionalization.py::TestFunctionalization::test_view_inplace, test/test_functionalization.py::TestCrossRefFunctionalization::test_advanced_indexing, test/test_functionalization.py::TestCrossRefFunctionalization::test_advanced_indexing_correct_strides, test/test_functionalization.py::TestCrossRefFunctionalization::test_aliases_maintained_after_pass_when_reapplying_views, test/test_functionalization.py::TestCrossRefFunctionalization::test_as_strided, test/test_functionalization.py::TestCrossRefFunctionalization::test_batch_norm, test/test_functionalization.py::TestCrossRefFunctionalization::test_cat, test/test_functionalization.py::TestCrossRefFunctionalization::test_channels_last_contiguous, test/test_functionalization.py::TestCrossRefFunctionalization::test_copy_, test/test_functionalization.py::TestCrossRefFunctionalization::test_copy_stride_mismatch, test/test_functionalization.py::TestCrossRefFunctionalization::test_diagonal, test/test_functionalization.py::TestCrossRefFunctionalization::test_diagonal_mutated_input, test/test_functionalization.py::TestCrossRefFunctionalization::test_everything, test/test_functionalization.py::TestCrossRefFunctionalization::test_expand_symint, test/test_functionalization.py::TestCrossRefFunctionalization::test_fill_, test/test_functionalization.py::TestCrossRefFunctionalization::test_freeze, test/test_functionalization.py::TestCrossRefFunctionalization::test_index_mutation_on_non_input, test/test_functionalization.py::TestCrossRefFunctionalization::test_inplace_on_non_view, test/test_functionalization.py::TestCrossRefFunctionalization::test_instance_norm, test/test_functionalization.py::TestCrossRefFunctionalization::test_metadata_change, test/test_functionalization.py::TestCrossRefFunctionalization::test_metadata_change_out_op, test/test_functionalization.py::TestCrossRefFunctionalization::test_mixed_wrappers_invalid, test/test_functionalization.py::TestCrossRefFunctionalization::test_mixed_wrappers_valid, test/test_functionalization.py::TestCrossRefFunctionalization::test_multi_out, test/test_functionalization.py::TestCrossRefFunctionalization::test_multiple_views_of_same_base, test/test_functionalization.py::TestCrossRefFunctionalization::test_mutable_op_not_inplace_or_other, test/test_functionalization.py::TestCrossRefFunctionalization::test_mutation_overlapping_mem, test/test_functionalization.py::TestCrossRefFunctionalization::test_nested_functions_propagate_updates, test/test_functionalization.py::TestCrossRefFunctionalization::test_only_one_view, test/test_functionalization.py::TestCrossRefFunctionalization::test_optional_tensor_list, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization_conj, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization_is_conj, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization_is_neg, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization_lift_fresh, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization_lift_fresh_storage, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization_neg, test/test_functionalization.py::TestCrossRefFunctionalization::test_python_functionalization_zero_tensor, test/test_functionalization.py::TestCrossRefFunctionalization::test_reapply_views_simple, test/test_functionalization.py::TestCrossRefFunctionalization::test_resize_larger_invalid, test/test_functionalization.py::TestCrossRefFunctionalization::test_resize_larger_valid, test/test_functionalization.py::TestCrossRefFunctionalization::test_resize_same_size_diff_rank, test/test_functionalization.py::TestCrossRefFunctionalization::test_resize_smaller, test/test_functionalization.py::TestCrossRefFunctionalization::test_save_for_backwards_segfault, test/test_functionalization.py::TestCrossRefFunctionalization::test_scalars, test/test_functionalization.py::TestCrossRefFunctionalization::test_set_, test/test_functionalization.py::TestCrossRefFunctionalization::test_simple, test/test_functionalization.py::TestCrossRefFunctionalization::test_simple_out, test/test_functionalization.py::TestCrossRefFunctionalization::test_slice, test/test_functionalization.py::TestCrossRefFunctionalization::test_split, test/test_functionalization.py::TestCrossRefFunctionalization::test_split_with_sizes, test/test_functionalization.py::TestCrossRefFunctionalization::test_tensor_ctr, test/test_functionalization.py::TestCrossRefFunctionalization::test_tensor_list_composite, test/test_functionalization.py::TestCrossRefFunctionalization::test_tensor_list_mixed_functional_nonfunctional, test/test_functionalization.py::TestCrossRefFunctionalization::test_unbind, test/test_functionalization.py::TestCrossRefFunctionalization::test_view_clone_view_inplace, test/test_functionalization.py::TestCrossRefFunctionalization::test_view_inplace 2025-09-07T08:45:17.5584064Z 2025-09-07T08:45:17.5584159Z Running test_functionalization_of_rng_ops 1/1 ... [2025-09-07 08:45:17.554506] 2025-09-07T08:45:17.5584346Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:45:17.5584750Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_functionalization_of_rng_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:45:17.554725] 2025-09-07T08:45:24.4321826Z 2025-09-07T08:45:24.4322833Z test_functionalization_of_rng_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_functionalization_of_rng_ops_1.1_52b5c24071a54dc1_.log 2025-09-07T08:45:24.4327375Z Running 10 items in this shard: test/test_functionalization_of_rng_ops.py::TestFunctionalizationRngOpsCUDA::test_autograd_function_cuda_float32, test/test_functionalization_of_rng_ops.py::TestFunctionalizationRngOpsCUDA::test_checkpoint_cuda_float32, test/test_functionalization_of_rng_ops.py::TestFunctionalizationRngOpsCUDA::test_dropout_decomp_cuda_float32, test/test_functionalization_of_rng_ops.py::TestFunctionalizationRngOpsCUDA::test_min_cut_partitioner_cuda_float32, test/test_functionalization_of_rng_ops.py::TestFunctionalizationRngOpsCUDA::test_multiple_subgraphs_cuda_float32, test/test_functionalization_of_rng_ops.py::TestFunctionalizationRngOpsCUDA::test_rand_cuda_float32, test/test_functionalization_of_rng_ops.py::TestFunctionalizationRngOpsCUDA::test_rand_like_cuda_float32, test/test_functionalization_of_rng_ops.py::TestFunctionalizationRngOpsCUDA::test_rand_like_dynamic_bwd_cuda_float32, test/test_functionalization_of_rng_ops.py::TestFunctionalizationRngOpsCUDA::test_rand_like_dynamic_cuda_float32, test/test_functionalization_of_rng_ops.py::TestFunctionalizationRngOpsCUDA::test_set_get_rng_state_cuda_float32 2025-09-07T08:45:24.4330926Z 2025-09-07T08:45:24.4331074Z Running test_futures 1/1 ... [2025-09-07 08:45:24.431943] 2025-09-07T08:45:24.4331385Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:45:24.4332257Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_futures.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:45:24.432141] 2025-09-07T08:45:27.0518523Z 2025-09-07T08:45:27.0528330Z test_futures 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_futures_1.1_9b7f4c728eb0be7a_.log 2025-09-07T08:45:27.0531641Z Running 22 items in this shard: test/test_futures.py::TestFuture::test_add_done_callback_error_is_ignored, test/test_futures.py::TestFuture::test_add_done_callback_maintains_callback_order, test/test_futures.py::TestFuture::test_add_done_callback_no_arg_error_is_ignored, test/test_futures.py::TestFuture::test_add_done_callback_simple, test/test_futures.py::TestFuture::test_chained_then, test/test_futures.py::TestFuture::test_collect_all, test/test_futures.py::TestFuture::test_done, test/test_futures.py::TestFuture::test_done_exception, test/test_futures.py::TestFuture::test_interleaving_then_and_add_done_callback_maintains_callback_order, test/test_futures.py::TestFuture::test_interleaving_then_and_add_done_callback_propagates_error, test/test_futures.py::TestFuture::test_mark_future_twice, test/test_futures.py::TestFuture::test_pickle_future, test/test_futures.py::TestFuture::test_set_exception, test/test_futures.py::TestFuture::test_set_exception_multithreading, test/test_futures.py::TestFuture::test_then, test/test_futures.py::TestFuture::test_then_no_arg, test/test_futures.py::TestFuture::test_then_raise, test/test_futures.py::TestFuture::test_then_wrong_arg, test/test_futures.py::TestFuture::test_wait, test/test_futures.py::TestFuture::test_wait_all, test/test_futures.py::TestFuture::test_wait_multi_thread, test/test_futures.py::TestFuture::test_wait_none 2025-09-07T08:45:27.0534532Z 2025-09-07T08:45:27.0534637Z Running test_fx 1/3 ... [2025-09-07 08:45:27.051716] 2025-09-07T08:45:27.0534854Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:45:27.0535402Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_fx.py', '--shard-id=1', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:45:27.051936] 2025-09-07T08:55:36.0646329Z 2025-09-07T08:55:36.0659172Z test_fx 1/3 was successful, full logs can be found in artifacts with path test/test-reports/test_fx_1.3_4c74184520d35d44_.log 2025-09-07T08:55:36.0716108Z Running 431 items in this shard: test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationMetadata_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_ReturnList_cpu, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_MutationFactory_cuda, test/test_fx.py::TestCSEPass::test_banned_list, test/test_fx.py::TestCSEPass::test_immutable_list_multiple_entries, test/test_fx.py::TestCSEPass::test_kwarg, test/test_fx.py::TestCSEPass::test_nested_immutable_list_type, test/test_fx.py::TestCSEPass::test_simple, test/test_fx.py::TestCSEPass::test_simple_2, test/test_fx.py::TestCSEPass::test_two_args_default, test/test_fx.py::TestDCE::test_impure_custom, test/test_fx.py::TestDCE::test_impure_nodes_args, test/test_fx.py::TestDCE::test_impure_random, test/test_fx.py::TestDCE::test_keep_setitem, test/test_fx.py::TestDCE::test_keep_torch_assert, test/test_fx.py::TestConstFold::test_const_fold_basic_one_attr_name_collision, test/test_fx.py::TestConstFold::test_const_fold_basic_two_attr, test/test_fx.py::TestConstFold::test_const_fold_basic_two_attr_three_input, test/test_fx.py::TestConstFold::test_const_fold_tensor_meta, test/test_fx.py::TestConstFold::test_three_outputs, test/test_fx.py::TestConstFold::test_two_outputs, test/test_fx.py::AnnotationsTest::test_consistency, test/test_fx.py::AnnotationsTest::test_precision, test/test_fx.py::TypeCheckerTest::test_resnet50, test/test_fx.py::TypeCheckerTest::test_type_check_add_with_scalar, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_2D_broadcast, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_2D_false, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_maxpool2d_flatten, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_types, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_dyn_true, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_false, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_true, test/test_fx.py::TypeCheckerTest::test_type_check_transpose_true, test/test_fx.py::TestMatcher::test_subgraph_matcher_with_attributes, test/test_fx.py::TestPassManager::test_pass_manager, test/test_fx.py::TestPassManager::test_pass_manager_checks, test/test_fx.py::TestPassManager::test_pass_manager_error, test/test_fx.py::TestPassManager::test_this_before_that_pass_constraint, test/test_fx.py::TestSourceMatcher::test_module_partitioner_conv_relu_maxpool_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_conv_relu_conv_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_linear_relu_linear, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_linear_relu_linear_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_linear_relu_linear, test/test_fx.py::TestSourceMatcher::test_module_partitioner_weight_tied_strict_True, test/test_fx.py::TestSubgraphRewriter::test_replacement_with_attrs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_correct_output_replacement, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_pattern_is_entire_graph, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_unused_args, test/test_fx.py::TestFX::test_annotation_with_future, test/test_fx.py::TestFX::test_annotations_with_non_torch_reference_and_internal_forward_references, test/test_fx.py::TestFX::test_args_kwargs, test/test_fx.py::TestFX::test_assert, test/test_fx.py::TestFX::test_ast_rewriter_wrap, test/test_fx.py::TestFX::test_ast_rewriter_wrap_with_submodule, test/test_fx.py::TestFX::test_ast_rewriter_wrapped_via_decorator, test/test_fx.py::TestFX::test_ast_rewriter_wrapped_via_decorator_and_transformed, test/test_fx.py::TestFX::test_autowrap_functions, test/test_fx.py::TestFX::test_concrete_arg_none_assert, test/test_fx.py::TestFX::test_construct_root_dict, test/test_fx.py::TestFX::test_copy_no_remap, test/test_fx.py::TestFX::test_custom_codegen_with_transformer, test/test_fx.py::TestFX::test_custom_proxy_type_literal, test/test_fx.py::TestFX::test_custom_traceback_not_raised_when_exception_source_is_submodule, test/test_fx.py::TestFX::test_deepcopy_recursion_depth, test/test_fx.py::TestFX::test_deepcopy_with_submods_params, test/test_fx.py::TestFX::test_ellipsis, test/test_fx.py::TestFX::test_empty_graph_codegen, test/test_fx.py::TestFX::test_erase_node_error, test/test_fx.py::TestFX::test_fx_create_arg, test/test_fx.py::TestFX::test_fx_stateless, test/test_fx.py::TestFX::test_get_torch_func_signature, test/test_fx.py::TestFX::test_graph_edit_with_proxy, test/test_fx.py::TestFX::test_graph_fns, test/test_fx.py::TestFX::test_graph_module, test/test_fx.py::TestFX::test_graph_module_init_buffer_param_copied_dict_init, test/test_fx.py::TestFX::test_graph_module_init_buffer_param_copied_mod_init, test/test_fx.py::TestFX::test_graph_unique_names_manual, test/test_fx.py::TestFX::test_informative_co_filename, test/test_fx.py::TestFX::test_insertion_point, test/test_fx.py::TestFX::test_interpreter_not_enough_args, test/test_fx.py::TestFX::test_interpreter_partial_eval, test/test_fx.py::TestFX::test_interpreter_star_args, test/test_fx.py::TestFX::test_leaf_module, test/test_fx.py::TestFX::test_matmul_tracing, test/test_fx.py::TestFX::test_multi_insert_point, test/test_fx.py::TestFX::test_multiple_default_args, test/test_fx.py::TestFX::test_node_tagging, test/test_fx.py::TestFX::test_nonetype_annotation, test/test_fx.py::TestFX::test_partial_trace, test/test_fx.py::TestFX::test_pickle_custom_import, test/test_fx.py::TestFX::test_prepend_self, test/test_fx.py::TestFX::test_pretty_print_node, test/test_fx.py::TestFX::test_proxy_deepcopy_without_tracer, test/test_fx.py::TestFX::test_reassign_args_kwargs_uses, test/test_fx.py::TestFX::test_remove_uses, test/test_fx.py::TestFX::test_reserved_getattr, test/test_fx.py::TestFX::test_script_method_trace, test/test_fx.py::TestFX::test_sequential, test/test_fx.py::TestFX::test_shape_prop_unbacked_sym, test/test_fx.py::TestFX::test_symbolic_trace_sequential, test/test_fx.py::TestFX::test_tensor_constant, test/test_fx.py::TestFX::test_torch_fx_getattr, test/test_fx.py::TestFX::test_torchbind_class_attribute_in_fx, test/test_fx.py::TestFX::test_trace_dict_int_keys, test/test_fx.py::TestFX::test_trace_return_dataclass, test/test_fx.py::TestFX::test_trace_return_dataclass_nested, test/test_fx.py::TestFX::test_tracing_graphmodules_as_leaf_submodules, test/test_fx.py::TestFX::test_transformer_multi_outputs, test/test_fx.py::TestFX::test_typename_print, test/test_fx.py::TestFX::test_unpack, test/test_fx.py::TestFX::test_update_kwargs_api, test/test_fx.py::TestFX::test_varargs_concrete, test/test_fx.py::TestFX::test_wrap_with_submodule, test/test_fx.py::TestFX::test_wrapped_retrace, test/test_fx.py::TestFX::test_wrapped_via_decorator, test/test_fx.py::TestFX::test_wrong_target_type, test/test_fx.py::TestFXAPIBackwardCompatibility::test_preserve_unused_attr_after_unpickle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool3d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_affine_grid, test/test_fx.py::TestFunctionalTracing::test_nn_functional_alpha_dropout, test/test_fx.py::TestFunctionalTracing::test_nn_functional_bilinear, test/test_fx.py::TestFunctionalTracing::test_nn_functional_celu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_ctc_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_feature_alpha_dropout, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool3d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_group_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_gumbel_softmax, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardtanh, test/test_fx.py::TestFunctionalTracing::test_nn_functional_lp_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_lp_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_margin_ranking_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool1d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_unpool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_unpool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multilabel_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multilabel_soft_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_native_channel_shuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_nll_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pixel_shuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_relu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_rrelu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_rrelu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_selu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_soft_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softmax, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softmin, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softplus, test/test_fx.py::TestFunctionalTracing::test_nn_functional_unfold, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_T_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___radd___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rpow___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rsub___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__unsafe_masked_index_put_accumulate_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_abs_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_acos_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_acosh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addcdiv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addmv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_all_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_angle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argsort_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atan_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atleast_2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atleast_3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bernoulli_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bool_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_byte_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cartesian_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cauchy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ceil_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clamp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clone_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_combinations_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_conj_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_copysign_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_corrcoef_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_count_nonzero_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cummax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cumsum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diff_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dist_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_einsum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_eq_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_equal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_exp2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expand_as_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expand_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_exponential_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_eye_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_hfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ihfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ihfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_irfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_irfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_irfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_rfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_rfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_rfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fliplr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_floor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_floor_divide_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_full_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ge_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_geometric_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_gradient_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_grid_sampler_3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_hsplit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isinf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isneginf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isreal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_binary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_kron_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_kthvalue_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_le_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lgamma_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_det_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eig_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eigh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eigvals_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eigvalsh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_ldl_factor_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_ldl_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lstsq_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_factor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_factor_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_pinv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_solve_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_svd_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_tensorinv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_tensorsolve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_vecdot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linspace_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logaddexp2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logcumsumexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_not_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logspace_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lu_unpack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mH_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_cumsum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_fill_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_softmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_std_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_matrix_exp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_binary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_reduction_with_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_median_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_meshgrid_variadic_tensors_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_min_reduction_no_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_multinomial_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nanmean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nanquantile_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_narrow_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_narrow_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_native_batch_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_native_layer_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ne_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_empty_strided_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_full_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nextafter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_batch_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_binary_cross_entropy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_channel_shuffle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_dropout2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_dropout3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_dropout_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_embedding_bag_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_embedding_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_group_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_huber_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_instance_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_area_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_bicubic_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_linear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_nearest_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_trilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_kl_div_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_layer_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_local_response_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_margin_ranking_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_mish_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_constant_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_reflect_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_poisson_nll_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_prelu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_relu6_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_rrelu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_smooth_l1_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_soft_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softsign_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_threshold_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_triplet_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_upsample_nearest_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_nuc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_normal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_pca_lowrank_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_permute_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_pinverse_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_4_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_positive_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rad2deg_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_reciprocal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_repeat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resolve_conj_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_roll_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_decimals_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rsqrt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_add_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sgn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sigmoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sign_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_exponential_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_gaussian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_general_cosine_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_nuttall_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_slice_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sort_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sparse_sampled_addmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_y0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_u_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_hermite_polynomial_h_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_i1e_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_i0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_i1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_k0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_ndtr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_with_sizes_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_squeeze_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_stack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_stft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sub_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sum_to_size_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_svd_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_take_along_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tan_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_topk_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_transpose_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trapz_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tril_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_triu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unbind_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unbind_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_uniform_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unique_consecutive_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unique_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsqueeze_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_view_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_where_cuda_float32, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet169, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_mobilenet_v3_large_320_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_maskrcnn_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_retinanet_resnet50_fpn_v2, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_ssd300_vgg16, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b2, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b3, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b6, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b7, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_v2_l, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_v2_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_maxvit_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_mobilenet_v3_small, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_800mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_128gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_16gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet18, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnext101_32x8d, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_deeplabv3_resnet101, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_deeplabv3_resnet50, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_lraspp_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_v2_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_v2_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg11, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg11_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg13_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_mvit_v1_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_mvit_v2_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_swin3d_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_swin3d_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_32, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_h_14, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_l_16, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_l_32, test/test_fx.py::TestVisionTracing::test_torchvision_models_wide_resnet101_2 2025-09-07T08:55:36.0769789Z 2025-09-07T08:55:36.0769860Z Running test_meta 1/2 ... [2025-09-07 08:55:36.065926] 2025-09-07T08:55:36.0770010Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T08:55:36.0770429Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_meta.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 08:55:36.066166] 2025-09-07T09:03:22.7306070Z 2025-09-07T09:03:22.7308067Z test_meta 1/2 was successful, full logs can be found in artifacts with path test/test-reports/test_meta_1.2_28884a4427d3f12e_.log 2025-09-07T09:03:22.9676075Z Running 20297 items in this shard: test/test_meta.py::TestMetaConverter::test_complex_noncontiguous_bug, test/test_meta.py::TestMetaConverter::test_inplace_set_storage, test/test_meta.py::TestMetaConverter::test_leaf, test/test_meta.py::TestMetaConverter::test_non_leaf, test/test_meta.py::TestMetaConverter::test_requires_grad_false, test/test_meta.py::TestMetaConverter::test_tensor_outlives_converter, test/test_meta.py::TestMetaConverter::test_view_as_complex, test/test_meta.py::TestMetaConverter::test_view_as_real, test/test_meta.py::TestMetaConverter::test_view_mutate, test/test_meta.py::TestMetaConverter::test_view_of_leaf, test/test_meta.py::TestMetaConverter::test_view_of_non_leaf, test/test_meta.py::TestMetaConverter::test_weakref, test/test_meta.py::TestMetaCUDA::test_batch_norm_backward_output_mask0_cuda, test/test_meta.py::TestMetaCUDA::test_batch_norm_backward_output_mask1_cuda, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_atan2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_copysign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_div_floor_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_heaviside_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_igammac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_isclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_lt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_special_xlog1py_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_true_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_div_floor_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_div_no_rounding_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_igammac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_jiterator_binary_return_by_ref_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_max_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_hermite_polynomial_h_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_xlog1py_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_cdist_forward_cuda, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_T_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_T_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_T_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_T_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___getitem___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___getitem___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___getitem___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___getitem___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___getitem___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___getitem___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rand___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rand___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rand___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rand___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmatmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___ror___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___ror___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___ror___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rpow___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rpow___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rpow___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rpow___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rpow___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rpow___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rpow___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rxor___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rxor___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rxor___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rxor___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__batch_norm_with_update_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcmul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcmul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_frac_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_norm_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_norm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_norm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__native_batch_norm_legit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__segment_reduce_lengths_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__segment_reduce_lengths_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__segment_reduce_offsets_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__segment_reduce_offsets_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__softmax_backward_data_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__softmax_backward_data_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__upsample_bilinear2d_aa_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addbmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcdiv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_decomposed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_decomposed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_decomposed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_allclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_allclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_arange_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_arange_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_arange_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_arange_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_baddbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_baddbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bernoulli_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bernoulli_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bincount_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bincount_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bincount_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_left_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_left_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_not_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_not_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_right_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_right_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bucketize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bucketize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bucketize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bucketize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bucketize_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bucketize_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bucketize_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bucketize_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bucketize_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cauchy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cauchy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cauchy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdouble_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdouble_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdouble_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdouble_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdouble_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdouble_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdouble_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cfloat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cfloat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cfloat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cfloat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cfloat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cholesky_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cholesky_inverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cholesky_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clone_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clone_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clone_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clone_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_copysign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_copysign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_copysign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_copysign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_copysign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_count_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_count_nonzero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_count_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_count_nonzero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_count_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_count_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_count_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_count_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cross_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cross_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumulative_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumulative_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagflat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagflat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagflat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagflat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagflat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagflat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagflat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dist_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_floor_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_floor_rounding_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_floor_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_trunc_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_trunc_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_trunc_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_einsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_einsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_equal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_equal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_equal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_equal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_equal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfinv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfinv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfinv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfinv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_float8_e5m2, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flatten_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_divide_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_divide_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_divide_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_frexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_frexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_frexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gcd_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gcd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gcd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gcd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_geometric_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_geometric_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_geometric_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_geometric_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_geometric_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_geqrf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_grid_sampler_3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_grid_sampler_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_half_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_half_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hash_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hash_tensor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hash_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hash_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hash_tensor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hash_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_heaviside_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_heaviside_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_heaviside_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_histc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_histc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_histc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_histc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hypot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hypot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hypot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_i0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_igamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_igammac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_igammac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_mean_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_inner_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_inner_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isneginf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isneginf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isneginf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isneginf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isneginf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isneginf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isposinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isposinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isposinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isposinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_istft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_istft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_item_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_item_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_item_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_item_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_item_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_item_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_item_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_item_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_unary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_unary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_unary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_unary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_unary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_unary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_unary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kthvalue_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kthvalue_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kthvalue_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kthvalue_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kthvalue_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lcm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lcm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lerp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lgamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cond_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cond_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_det_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eig_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eig_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigvals_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigvalsh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigvalsh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_householder_product_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_householder_product_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_householder_product_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_inv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_inv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_inv_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_inv_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_inv_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_inv_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_factor_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lstsq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lstsq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lstsq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lstsq_grad_oriented_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lstsq_grad_oriented_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lstsq_grad_oriented_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_factor_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_rank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_rank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_rank_hermitian_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_rank_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_multi_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_multi_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_multi_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_multi_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_norm_subgradients_at_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_norm_subgradients_at_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_norm_subgradients_at_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_hermitian_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_singular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_singular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_slogdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_triangular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_svdvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_svdvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_tensorinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_tensorinv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_tensorinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_tensorinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_tensorsolve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_tensorsolve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vander_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vander_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vecdot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vector_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log1p_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_normal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_normal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logcumsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logcumsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logcumsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logcumsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logcumsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_not_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_not_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lu_unpack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_log_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logsumexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_median_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_matmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_matmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_pool2d_with_indices_backward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_pool2d_with_indices_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_pool2d_with_indices_backward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_median_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_median_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_median_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_median_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_median_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_median_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_no_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_no_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_with_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_minimum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_minimum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_msort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_msort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_msort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_msort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_multinomial_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_multinomial_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nan_to_num_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nan_to_num_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nan_to_num_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nan_to_num_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nan_to_num_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nan_to_num_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmedian_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmedian_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmedian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmedian_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmedian_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanquantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanquantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_native_dropout_backward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_native_dropout_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_native_dropout_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_native_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_native_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_ones_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nextafter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nextafter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nextafter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_alpha_dropout_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_alpha_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_batch_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_binary_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_binary_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_celu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_celu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_celu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_celu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_similarity_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_similarity_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_similarity_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_similarity_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_ctc_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_ctc_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_elu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_elu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_elu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_embedding_bag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_embedding_bag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_embedding_bag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_embedding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_embedding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_gaussian_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_gaussian_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_gelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_glu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_glu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_grid_sample_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_grid_sample_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_grid_sample_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_grid_sample_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_group_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_group_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_group_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardsigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardswish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardswish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardswish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardtanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardtanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardtanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hinge_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_huber_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_huber_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_instance_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_area_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_area_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_bicubic_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_bicubic_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_linear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_trilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_trilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_kl_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_l1_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_leaky_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_leaky_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_linear_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_linear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_local_response_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_local_response_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_logsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_logsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_margin_ranking_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_margin_ranking_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_margin_ranking_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_margin_ranking_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_margin_ranking_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool1d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool1d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool2d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool3d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool3d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool3d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_mish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multi_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multilabel_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_circular_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_circular_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_circular_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_circular_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_constant_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_constant_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_constant_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_constant_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_constant_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_reflect_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_reflect_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_reflect_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_poisson_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_poisson_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_poisson_nll_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_prelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_prelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu6_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu6_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu6_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu6_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu6_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rms_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rms_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rms_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rms_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rrelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_selu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_silu_complex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_silu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_silu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_silu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_smooth_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_smooth_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_smooth_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softplus_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softplus_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softplus_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softplus_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_threshold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_threshold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_threshold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_nearest_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_nearest_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_static_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_static_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_static_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_static_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_static_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_fro_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_inf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_inf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_inf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_inf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_nuc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_nuc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_in_place_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_in_place_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_in_place_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_in_place_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_in_place_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_number_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_number_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ormqr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ormqr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ormqr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_outer_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_outer_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_outer_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_outer_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_outer_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_outer_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pca_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pinverse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pinverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polar_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_3_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_3_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_qr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_quantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rand_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rand_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rand_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rand_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ravel_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ravel_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ravel_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ravel_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_remainder_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_remainder_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_renorm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_renorm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_renorm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_neg_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_neg_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_neg_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_mean_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_searchsorted_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_searchsorted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_searchsorted_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_searchsorted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sigmoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sigmoid_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sigmoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sigmoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sigmoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_blackman_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_blackman_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_gaussian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_gaussian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_general_cosine_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_general_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_general_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_hann_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_hann_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_kaiser_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sparse_mm_reduce_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sparse_sampled_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sparse_sampled_addmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sparse_sampled_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_airy_ai_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_airy_ai_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_airy_ai_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_airy_ai_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_airy_ai_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_v_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_entr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_entr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_entr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_erfcx_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_erfcx_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_erfcx_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_erfcx_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_h_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_h_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_h_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_h_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_h_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_h_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_h_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_he_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_he_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_he_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_he_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i0e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i0e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i0e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i0e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_legendre_polynomial_p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_legendre_polynomial_p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_legendre_polynomial_p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_legendre_polynomial_p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_log_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_log_ndtr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_log_ndtr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_log_ndtr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtri_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtri_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtri_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtri_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtri_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtri_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtri_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtri_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_scaled_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_scaled_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_scaled_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_scaled_modified_bessel_k1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_scaled_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_spherical_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_spherical_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_spherical_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_spherical_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_spherical_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_spherical_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_xlog1py_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_xlog1py_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_xlog1py_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_xlog1py_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_zeta_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_zeta_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_zeta_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_svd_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_svd_lowrank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensordot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__efficient_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__flash_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triangular_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triangular_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triangular_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_uniform_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_uniform_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_uniform_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_uniform_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_consecutive_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_consecutive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_consecutive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_consecutive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_consecutive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unravel_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unravel_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unravel_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_mean_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_mean_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vdot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rand___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rand___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rand___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmatmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmatmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmatmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmatmul___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___ror___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___ror___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___ror___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___ror___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rpow___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rpow___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rpow___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rpow___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rpow___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rpow___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rxor___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rxor___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rxor___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rxor___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__batch_norm_with_update_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__batch_norm_with_update_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcmul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_asin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_ceil_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_ceil_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_ceil_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lerp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lerp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lerp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lerp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_mul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__native_batch_norm_legit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__native_batch_norm_legit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__segment_reduce_lengths_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__segment_reduce_offsets_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__segment_reduce_offsets_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__softmax_backward_data_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__upsample_bilinear2d_aa_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__upsample_bilinear2d_aa_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addbmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_decomposed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_decomposed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_allclose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_allclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_allclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argsort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argsort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argsort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argsort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_1d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_1d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_1d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_1d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_baddbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bernoulli_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bernoulli_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bincount_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bincount_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_left_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_left_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_left_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_not_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_right_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_right_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_right_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_shapes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bucketize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bucketize_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bucketize_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bucketize_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cauchy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cauchy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cauchy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ceil_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cholesky_inverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cholesky_inverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cholesky_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cholesky_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_min_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_combinations_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_combinations_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_combinations_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_combinations_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_combinations_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_combinations_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_combinations_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_combinations_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_combinations_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_copysign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_copysign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_copysign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_copysign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_copysign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cov_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cov_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cov_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cov_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cov_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cov_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cross_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumprod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagflat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagflat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagflat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagflat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagflat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dist_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dist_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_floor_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_floor_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_floor_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_floor_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_floor_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_floor_rounding_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_trunc_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_trunc_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_trunc_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_trunc_rounding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_trunc_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_trunc_rounding_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_einsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_einsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_einsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exponential_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exponential_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_float8_e4m3fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_float8_e5m2, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_float8_e5m2fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flipud_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flipud_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flipud_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flipud_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flipud_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flipud_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flipud_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_divide_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_frexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_frexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gcd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gcd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geometric_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geometric_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geometric_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geometric_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geometric_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geqrf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gradient_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gradient_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gradient_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_grid_sampler_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_grid_sampler_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_grid_sampler_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_grid_sampler_3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_half_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_half_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_half_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_half_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_half_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_half_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_half_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_half_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_heaviside_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_heaviside_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_heaviside_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_heaviside_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_histc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_histc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hypot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_i0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_i0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_igamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_imag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_inner_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_inner_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isinf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isnan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isnan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isnan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isnan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isnan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isposinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isposinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isposinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isposinf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isposinf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_istft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_return_by_ref_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_return_by_ref_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_return_by_ref_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_return_by_ref_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kthvalue_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kthvalue_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kthvalue_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kthvalue_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kthvalue_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lcm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lcm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lcm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_le_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_le_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_le_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_le_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lerp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cholesky_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cholesky_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cholesky_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cond_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cond_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_det_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_det_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_det_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_det_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eig_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eig_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigvals_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigvalsh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigvalsh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_householder_product_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_householder_product_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_householder_product_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_inv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_inv_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_inv_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lstsq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lstsq_grad_oriented_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lstsq_grad_oriented_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_factor_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_rank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_rank_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_multi_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_multi_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_norm_subgradients_at_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_pinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_pinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_pinv_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_pinv_singular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_pinv_singular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_qr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_slogdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_triangular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_triangular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_triangular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_svdvals_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vecdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vecdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vecdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vector_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vector_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vector_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vector_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_tensor_overload_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_normal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_normal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logaddexp2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logaddexp2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logaddexp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_tensor_overload_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_tensor_overload_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_unpack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mH_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mH_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mH_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mH_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mH_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mH_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mH_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mH_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_log_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_log_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_matmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_matmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_matmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_matmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_matrix_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_matrix_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_pool2d_with_indices_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_pool2d_with_indices_backward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_median_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_median_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_median_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_list_of_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_list_of_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_list_of_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_list_of_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_list_of_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_list_of_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_with_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_multinomial_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_multinomial_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmean_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmedian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmedian_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmedian_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmedian_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanquantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanquantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_batch_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_dropout_backward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_dropout_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nextafter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_alpha_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_avg_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_batch_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_celu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_celu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_embedding_loss_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_similarity_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_similarity_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_similarity_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_ctc_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_elu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_embedding_bag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_embedding_bag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_embedding_bag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_embedding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_embedding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_with_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_fractional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_fractional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_fractional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_gaussian_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_gaussian_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_gelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_gelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_gelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_grid_sample_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_grid_sample_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_group_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_group_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardswish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardswish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardswish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardtanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardtanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardtanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardtanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hinge_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hinge_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_huber_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_huber_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_instance_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_instance_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_instance_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_area_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_bicubic_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_bicubic_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_bicubic_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_linear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_nearest_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_trilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_trilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_kl_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_leaky_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_linear_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_local_response_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_logsigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_logsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_margin_ranking_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_margin_ranking_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_margin_ranking_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool1d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool1d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool2d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool3d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool3d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_mish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_mish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_mish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_mish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_mse_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_mse_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multi_head_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multi_head_attention_forward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multi_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multi_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_constant_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_constant_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_poisson_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_poisson_nll_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_poisson_nll_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_poisson_nll_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_poisson_nll_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_prelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu6_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu6_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu6_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu6_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_rms_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_rms_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_rrelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_rrelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_rrelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_selu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_selu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_silu_complex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_silu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_smooth_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softplus_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_nearest_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_nearest_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_fro_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_fro_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_fro_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_inf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_inf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_inf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_nuc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_nuc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_in_place_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_in_place_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_in_place_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_in_place_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_number_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_number_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ormqr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_outer_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_outer_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_outer_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_outer_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pca_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pca_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pca_lowrank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pinverse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pinverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pinverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_3_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_positive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_positive_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_positive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_positive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_positive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_quantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_quantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rad2deg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rad2deg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rad2deg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rad2deg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rad2deg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rad2deg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rad2deg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rad2deg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rand_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rand_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rand_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rand_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_remainder_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_remainder_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_remainder_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_remainder_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_renorm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_renorm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rot90_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rot90_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rot90_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rot90_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rot90_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rot90_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_neg_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_neg_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_neg_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scalar_tensor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scalar_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scalar_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scalar_tensor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scalar_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scalar_tensor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scalar_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_mean_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_mean_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_searchsorted_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_searchsorted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_searchsorted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_searchsorted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_searchsorted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_bartlett_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_blackman_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_blackman_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_gaussian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_gaussian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_general_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_general_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_general_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_hann_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_with_dtype_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sparse_mm_reduce_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sparse_sampled_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sparse_sampled_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sparse_sampled_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_airy_ai_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_airy_ai_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_airy_ai_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_airy_ai_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_airy_ai_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_u_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_v_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_erfcx_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_erfcx_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_erfcx_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_h_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_h_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_h_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_h_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_he_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_he_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i0e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i0e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i0e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i0e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1e_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_laguerre_polynomial_l_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_laguerre_polynomial_l_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_laguerre_polynomial_l_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_log_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_log_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtri_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtri_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtri_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtri_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtri_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_spherical_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_spherical_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_spherical_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_spherical_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_xlog1py_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_xlog1py_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_xlog1py_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_xlog1py_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_xlog1py_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_zeta_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_zeta_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_multiple_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_multiple_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_multiple_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_multiple_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_multiple_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_multiple_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_multiple_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sub_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_svd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_svd_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_svd_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_svd_lowrank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensordot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensordot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensordot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensordot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_topk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_topk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_topk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_topk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch__scaled_mm_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__efficient_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__flash_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapz_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapz_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapz_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapz_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapz_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapz_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapz_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triangular_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triangular_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trunc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_uniform_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_consecutive_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_consecutive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_consecutive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_consecutive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_consecutive_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_consecutive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unravel_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_xlogy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_xlogy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_xlogy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_xlogy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rand___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rand___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmatmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmatmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmatmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmatmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmod___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmod___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmod___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmod___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmod___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmul___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmul___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmul___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmul___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___ror___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___ror___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rpow___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rpow___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rpow___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rpow___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rpow___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rpow___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rxor___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rxor___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rxor___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__batch_norm_with_update_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__batch_norm_with_update_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_atan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_atan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_frac_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_frac_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lerp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log10_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_round_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_round_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_round_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_round_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__native_batch_norm_legit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__native_batch_norm_legit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__native_batch_norm_legit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__native_batch_norm_legit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__segment_reduce_lengths_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__segment_reduce_offsets_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__segment_reduce_offsets_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__segment_reduce_offsets_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__softmax_backward_data_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__upsample_bilinear2d_aa_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__upsample_bilinear2d_aa_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcmul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmm_decomposed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides___radd___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides___rand___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides___ror___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides___rsub___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides___rxor___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__upsample_bilinear2d_aa_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_acosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_addbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_addmv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_alias_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_arange_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_argsort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_argwhere_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_as_strided_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_as_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_asinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_atanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_atleast_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bitwise_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bitwise_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_block_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bucketize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_byte_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cartesian_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cauchy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cdouble_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_clamp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_clone_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_contiguous_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cummax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cumulative_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_deg2rad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_diagflat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_diagonal_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_diagonal_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_diff_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_div_no_rounding_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_empty_permuted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_equal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_exp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_expand_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_expand_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_eye_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_fft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_hfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_hfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_ifft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_ifftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_ihfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_ihfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_ihfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_irfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_rfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_flip_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fliplr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_flipud_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_full_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_gather_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_gcd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_geometric_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_gradient_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_hash_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_heaviside_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_hstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_igammac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_index_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_index_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_int_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_isfinite_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_isinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_isnan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_isposinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_isreal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_istft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_jiterator_binary_return_by_ref_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_jiterator_unary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_kron_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_cond_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_det_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_inv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_ldl_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_lstsq_grad_oriented_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_lu_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_matrix_rank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_matrix_rank_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_norm_subgradients_at_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_pinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_pinv_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_pinv_singular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_slogdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_svdvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_tensorinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_tensorsolve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_vander_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_vecdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_vector_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_log_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_log_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logcumsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_lu_unpack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_mT_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_matmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_max_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_max_pool2d_with_indices_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_meshgrid_variadic_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_min_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_mm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_multinomial_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_mv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nan_to_num_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nanmean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nanmedian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nansum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_native_dropout_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_new_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_new_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_new_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_new_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nextafter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_alpha_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_binary_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_conv1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_conv_transpose3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_cosine_similarity_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_dropout2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_fractional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_fractional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_gelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_glu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_hardshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_hardsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_hardswish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_huber_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_interpolate_bicubic_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_kl_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_local_response_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_max_unpool1d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_max_unpool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_max_unpool2d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_mish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_multi_head_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_pad_circular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_pad_constant_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_pad_reflect_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_pad_replicate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_pairwise_distance_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_pdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_pixel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_pixel_unshuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_poisson_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_relu6_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_silu_complex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_silu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_softmin_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_tanhshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_upsample_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nonzero_static_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_ones_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_outer_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_pca_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_permute_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_pinverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_polygamma_polygamma_n_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_polygamma_polygamma_n_2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_positive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_rand_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_randint_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_ravel_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_renorm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_repeat_interleave_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_reshape_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_resize_as__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_resolve_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_resolve_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_round_decimals_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_scalar_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_searchsorted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_sgn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_blackman_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_kaiser_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_sinc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_slice_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_sparse_mm_reduce_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_sparse_sampled_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_airy_ai_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_bessel_j1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_bessel_y0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_erfcx_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_i0e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_laguerre_polynomial_l_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_log_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_squeeze_multiple_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_take_along_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_take_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_tensordot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_topk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_trapz_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_triangular_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_tril_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_triu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_triu_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_unbind_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_unbind_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_unflatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_unravel_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_unsafe_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_unsqueeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_var_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_var_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_view_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_view_as_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_view_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_view_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_vstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_allclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_aminmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_aminmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_aminmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_aminmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_angle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_angle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_angle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_1d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_1d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_1d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_baddbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_baddbmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bernoulli_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bincount_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bincount_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bincount_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_left_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_left_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_not_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_not_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_not_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_right_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_right_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_right_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_block_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_block_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_block_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_block_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_block_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_block_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_block_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_block_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_block_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_shapes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bucketize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bucketize_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bucketize_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bucketize_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cauchy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ceil_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ceil_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cholesky_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cholesky_inverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cholesky_inverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cholesky_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cholesky_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_count_nonzero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_count_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_count_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_count_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cov_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cov_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cov_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cov_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cov_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cov_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cov_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_digamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_digamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_digamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_digamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dist_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dist_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_no_rounding_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_no_rounding_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_no_rounding_mode_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_no_rounding_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_no_rounding_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_no_rounding_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_no_rounding_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_no_rounding_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_trunc_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_trunc_rounding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_trunc_rounding_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_trunc_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_einsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_einsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_einsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_equal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_equal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_equal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_equal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfinv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfinv_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfinv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfinv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfinv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfinv_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exponential_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exponential_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_float8_e4m3fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_float8_e5m2, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_float8_e5m2fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flip_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flip_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flip_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flip_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flip_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flip_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flip_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_power_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_power_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_power_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_divide_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_frexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gcd_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gcd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gcd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ge_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ge_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ge_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geometric_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geometric_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geometric_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geometric_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geqrf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geqrf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gradient_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gradient_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gradient_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gradient_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gradient_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gradient_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gradient_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_grid_sampler_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_grid_sampler_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_grid_sampler_3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_grid_sampler_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_grid_sampler_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hash_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hash_tensor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hash_tensor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hash_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_heaviside_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_heaviside_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_heaviside_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_heaviside_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_heaviside_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_histc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_histc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hypot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_i0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_imag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_inner_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_inner_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_inner_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_int_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_int_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_int_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_int_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_int_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isposinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isposinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isposinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isposinf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_return_by_ref_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_return_by_ref_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_return_by_ref_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_unary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_unary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_unary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_unary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_unary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_unary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lcm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lcm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ldexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ldexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ldexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ldexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ldexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ldexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_le_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_le_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_le_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_le_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lerp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cond_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cond_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cond_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cond_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_det_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eig_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigvalsh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigvalsh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigvalsh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_householder_product_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_householder_product_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_inv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_inv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_inv_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_inv_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_factor_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lstsq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lstsq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_power_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_rank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_rank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_rank_hermitian_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_rank_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_rank_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_multi_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_multi_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_multi_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_multi_dot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_norm_subgradients_at_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_norm_subgradients_at_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_hermitian_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_singular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_singular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_singular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_slogdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_slogdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_slogdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_triangular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_svdvals_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_svdvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_svdvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_tensorinv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_tensorinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_tensorsolve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_tensorsolve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vander_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vander_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vander_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vander_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vander_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vander_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vander_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vecdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vecdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vecdot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vector_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vector_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vector_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logaddexp2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logcumsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logcumsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logcumsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_or_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_or_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_or_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lu_unpack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lu_unpack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumsum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_log_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_median_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_var_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_var_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_var_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_matmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_matmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_matmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_matmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_matrix_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_matrix_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_matrix_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_pool2d_with_indices_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_pool2d_with_indices_backward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_with_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_maximum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_median_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_median_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_median_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_variadic_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_variadic_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_variadic_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_variadic_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_variadic_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_variadic_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_variadic_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_with_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_minimum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_minimum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_multinomial_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_multinomial_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_multinomial_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nan_to_num_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nan_to_num_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nan_to_num_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nan_to_num_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nan_to_num_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nan_to_num_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmean_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmedian_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmedian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmedian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmedian_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmedian_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nansum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nansum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nansum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nansum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nansum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nansum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_batch_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_dropout_backward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_dropout_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_dropout_backward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_alpha_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_alpha_dropout_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_alpha_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_batch_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_binary_cross_entropy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_celu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_celu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_embedding_loss_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_similarity_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cross_entropy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_ctc_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_embedding_bag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_embedding_bag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_embedding_bag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_embedding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_fractional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_fractional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_fractional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_fractional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_fractional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_gaussian_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_gelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_gelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_glu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_grid_sample_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_group_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_group_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardsigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardswish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardswish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardtanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardtanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hinge_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hinge_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_huber_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_huber_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_instance_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_instance_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_area_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_area_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_area_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_area_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_bicubic_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_bicubic_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_nearest-exact_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_nearest-exact_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_nearest-exact_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_nearest_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_trilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_trilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_kl_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_kl_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_l1_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_l1_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_leaky_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_linear_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_logsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool1d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool2d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool2d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool3d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool3d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_mish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_mish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_mish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_mse_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multi_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multi_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_one_hot_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pairwise_distance_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pairwise_distance_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pairwise_distance_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pairwise_distance_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pairwise_distance_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pairwise_distance_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pairwise_distance_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pairwise_distance_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_unshuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_unshuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_unshuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_unshuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_unshuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_poisson_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_poisson_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_poisson_nll_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_poisson_nll_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_prelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu6_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu6_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_rms_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_rms_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_rms_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_rms_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_rrelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_rrelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_selu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_silu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_smooth_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_smooth_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_smooth_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softplus_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softsign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softsign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softsign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softsign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_threshold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_threshold_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_threshold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_upsample_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_upsample_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_upsample_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_upsample_nearest_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_upsample_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_fro_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_fro_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_inf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_inf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_inf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_inf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_nuc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_in_place_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_in_place_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_number_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_number_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ormqr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ormqr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pca_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pca_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pinverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pinverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pinverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_qr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_quantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rand_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rand_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_remainder_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_remainder_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_remainder_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_renorm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_renorm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_renorm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_renorm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_roll_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_roll_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_roll_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_roll_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_roll_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rot90_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rot90_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rot90_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rot90_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rot90_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rot90_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rot90_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_neg_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_neg_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_mean_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sigmoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sigmoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_general_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_hann_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_hann_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signbit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signbit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signbit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signbit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sparse_mm_reduce_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sparse_mm_reduce_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sparse_sampled_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_airy_ai_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_airy_ai_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_airy_ai_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_airy_ai_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_u_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_v_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_erfcx_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_erfcx_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_erfcx_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_erfcx_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_erfcx_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_h_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_h_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_h_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_h_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_h_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_he_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_he_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_he_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_he_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_laguerre_polynomial_l_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_laguerre_polynomial_l_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_laguerre_polynomial_l_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_laguerre_polynomial_l_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_laguerre_polynomial_l_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_legendre_polynomial_p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_legendre_polynomial_p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_legendre_polynomial_p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_log_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_log_ndtr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_log_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_log_ndtr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtri_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtri_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtri_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtri_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_zeta_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_zeta_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_zeta_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_list_args_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_list_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_list_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_list_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_list_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensordot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensordot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_sparse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_sparse_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_sparse_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_sparse_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_topk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_topk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_topk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__efficient_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__efficient_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__flash_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triangular_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triangular_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triangular_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trunc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_uniform_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_uniform_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_uniform_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_consecutive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_consecutive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_consecutive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_consecutive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_consecutive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_consecutive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_consecutive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_consecutive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unravel_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unravel_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unravel_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zero__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zero__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zero__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zero__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rand___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rand___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmatmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmatmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmatmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmatmul___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___ror___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___ror___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___ror___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___ror___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rpow___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rpow___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rpow___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rxor___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rxor___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rxor___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rxor___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__batch_norm_with_update_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__batch_norm_with_update_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__batch_norm_with_update_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_asin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erfc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erfc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lerp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lerp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log10_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_maximum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_maximum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sigmoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sigmoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__native_batch_norm_legit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__native_batch_norm_legit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__native_batch_norm_legit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__segment_reduce_lengths_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__segment_reduce_offsets_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__segment_reduce_offsets_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__segment_reduce_offsets_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__softmax_backward_data_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__upsample_bilinear2d_aa_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__upsample_bilinear2d_aa_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addbmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_decomposed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_decomposed_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_decomposed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_H_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_T_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides___rpow___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides___rsub___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides___rxor___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__native_batch_norm_legit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__softmax_backward_data_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__unsafe_masked_index_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__unsafe_masked_index_put_accumulate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__upsample_bilinear2d_aa_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_acosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_addmv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_alias_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_all_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_angle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_any_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_as_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_asinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_atan2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_atanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_atleast_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bincount_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bitwise_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bitwise_left_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bitwise_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bool_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_broadcast_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bucketize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cartesian_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cauchy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_char_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_clone_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_constant_pad_nd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_contiguous_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_copysign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_count_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cov_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cummax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_diag_embed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_diagflat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_diff_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_div_floor_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_div_trunc_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_empty_permuted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_equal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_exp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_fftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_hfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_ifft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_ifft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_ifftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_ifftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_irfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_irfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_rfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_rfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_floor_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_frexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_full_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_gather_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_geometric_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_geqrf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_gradient_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_hstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_index_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_index_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_index_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_index_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_int_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_isfinite_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_isnan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_isposinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_isreal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_jiterator_2inputs_2outputs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_jiterator_4inputs_with_extra_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_kthvalue_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_lcm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_ldexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_cholesky_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_cond_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_eig_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_eigvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_inv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_inv_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_ldl_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_ldl_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_lstsq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_lstsq_grad_oriented_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_lu_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_matrix_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_matrix_rank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_matrix_rank_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_slogdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_solve_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_solve_triangular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_svdvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_tensorinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_vander_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_vecdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_vector_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_log_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logaddexp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logcumsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_long_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_lu_unpack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_mH_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_matmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_matrix_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_max_pool2d_with_indices_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_max_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_max_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_min_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_min_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_mm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_mv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nanmedian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_narrow_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_native_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_new_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_new_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nextafter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_celu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_conv_transpose1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_conv_transpose3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_cosine_similarity_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_ctc_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_dropout2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_dropout3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_embedding_bag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_fractional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_fractional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_gaussian_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_gelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_glu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_grid_sample_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_hardshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_hardswish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_huber_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_interpolate_area_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_interpolate_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_leaky_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_unpool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_unpool1d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_unpool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_unpool3d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_mish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_multi_head_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_pad_constant_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_pairwise_distance_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_relu6_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_rrelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_silu_complex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_smooth_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_softplus_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_threshold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_upsample_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_normal_number_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_ones_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_ormqr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_pca_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_permute_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_pinverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_polygamma_polygamma_n_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_quantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_randint_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_ravel_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_reshape_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_reshape_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_resize_as__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_resolve_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_rot90_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_round_decimals_neg_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_scatter_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_scatter_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_short_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signbit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_slice_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sparse_mm_reduce_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_airy_ai_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_bessel_j1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_bessel_y1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_entr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_i0e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_modified_bessel_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_scaled_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_spherical_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_xlog1py_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_split_with_sizes_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_squeeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_squeeze_multiple_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_std_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_stft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sum_to_size_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_svd_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_take_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tensordot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_topk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_trace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_transpose_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_transpose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_triangular_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tril_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_triu_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unbind_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unflatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unfold_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unravel_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unsqueeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_var_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_vsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_where_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_zeros_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_allclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_allclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_allclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_allclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_aminmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_aminmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_aminmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_aminmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_angle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_angle_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_angle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argsort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argsort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argsort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argsort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argsort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_baddbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_baddbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_baddbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_baddbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_baddbmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bernoulli_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bernoulli_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bincount_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bincount_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bincount_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_left_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_right_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_right_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bool_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bool_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bool_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bool_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bool_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bool_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bool_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_shapes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bucketize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bucketize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bucketize_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bucketize_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cauchy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cauchy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cdouble_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cdouble_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cdouble_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cdouble_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cdouble_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ceil_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cholesky_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cholesky_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cholesky_inverse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cholesky_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cholesky_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_copysign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_copysign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_copysign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_copysign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cross_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cross_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cross_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumsum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_deg2rad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_deg2rad_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_deg2rad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_deg2rad_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_deg2rad_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_digamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_digamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_digamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_digamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_digamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_digamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dist_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dist_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_floor_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_floor_rounding_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_floor_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_floor_rounding_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_einsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_einsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_einsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfinv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfinv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfinv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfinv_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exponential_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exponential_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_float8_e5m2, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_float8_e5m2fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_divide_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_frexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gcd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gcd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ge_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ge_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ge_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geqrf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gradient_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gradient_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gradient_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gradient_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gradient_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gradient_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gradient_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_grid_sampler_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_grid_sampler_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_grid_sampler_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_grid_sampler_3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hash_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hash_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hash_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hash_tensor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hash_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_heaviside_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_heaviside_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_heaviside_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_heaviside_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_histc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hypot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hypot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hypot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_i0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_igamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_imag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_inner_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_inner_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_inner_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isneginf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isneginf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isneginf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isneginf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_2inputs_2outputs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_2inputs_2outputs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_2inputs_2outputs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_2inputs_2outputs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_2inputs_2outputs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_2inputs_2outputs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_unary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_unary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_unary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_unary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kron_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kron_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kron_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kron_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kron_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kron_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kthvalue_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kthvalue_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kthvalue_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kthvalue_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kthvalue_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lcm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_le_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_le_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_le_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_le_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_le_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_le_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lerp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cholesky_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cholesky_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cholesky_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cond_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cond_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_det_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_det_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_det_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eig_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eig_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvalsh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvalsh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvalsh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_householder_product_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_householder_product_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_inv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_inv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_inv_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_inv_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lstsq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lstsq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lstsq_grad_oriented_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_power_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_multi_dot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_subgradients_at_zero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_subgradients_at_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_subgradients_at_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_pinv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_pinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_pinv_hermitian_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_pinv_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_pinv_singular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_pinv_singular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_qr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_slogdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_slogdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_solve_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_solve_triangular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_solve_triangular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_tensorinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_tensorinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_tensorsolve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_tensorsolve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_tensorsolve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vander_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vander_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vander_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vander_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vander_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vander_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vecdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vecdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vecdot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vector_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vector_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vector_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vector_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logcumsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_not_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_not_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_tensor_overload_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_long_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_long_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_long_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_long_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_long_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_long_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_long_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lu_unpack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lu_unpack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumprod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumsum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumsum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_log_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logsumexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_softmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_var_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matrix_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matrix_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matrix_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matrix_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_pool2d_with_indices_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_pool2d_with_indices_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_pool2d_with_indices_backward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_with_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_with_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_maximum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_median_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_median_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_median_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_median_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_no_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_no_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_with_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_multinomial_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_multinomial_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_multinomial_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmean_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmedian_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmedian_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmedian_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmedian_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmedian_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_batch_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_dropout_backward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_dropout_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_dropout_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ne_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ne_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ne_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ne_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_neg_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nextafter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_avg_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_batch_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_batch_norm_without_cudnn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_binary_cross_entropy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_celu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_celu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_embedding_loss_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_similarity_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_similarity_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_ctc_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_dropout2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_dropout2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_dropout3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_dropout_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_elu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_elu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_embedding_bag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_embedding_bag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_embedding_bag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_embedding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_embedding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_fractional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_fractional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_fractional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_fractional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_gaussian_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_gaussian_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_gaussian_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_gelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_glu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_glu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_glu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_group_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_group_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_group_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardswish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardswish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardtanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardtanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardtanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_huber_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_huber_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_huber_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_instance_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_instance_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_area_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_area_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_bicubic_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_bicubic_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_bicubic_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_nearest_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_trilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_trilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_kl_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_kl_div_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_leaky_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_linear_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_linear_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_local_response_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_local_response_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_logsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool2d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool2d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool3d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool3d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_mish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_mish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_mish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_mse_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_mse_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_mse_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_head_attention_forward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_one_hot_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_reflect_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_reflect_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_reflect_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_reflect_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_reflect_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_reflect_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_negative_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_negative_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_negative_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_negative_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_negative_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pairwise_distance_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pairwise_distance_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pairwise_distance_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pairwise_distance_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pairwise_distance_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pairwise_distance_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_prelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_prelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rms_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rms_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rms_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rms_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rrelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rrelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rrelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_selu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_selu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_silu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_silu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_smooth_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_smooth_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softplus_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softplus_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softplus_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_upsample_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_upsample_nearest_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_fro_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_fro_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_fro_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_inf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_inf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_inf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_inf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_nuc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_in_place_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_in_place_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_in_place_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_in_place_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_in_place_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_in_place_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_number_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ormqr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pca_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pca_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pca_lowrank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pinverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polar_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_3_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_qr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_quantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rad2deg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rad2deg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rad2deg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rad2deg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rad2deg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rad2deg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rad2deg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rand_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rand_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_remainder_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_remainder_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_remainder_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_renorm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_renorm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_renorm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_renorm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_interleave_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_interleave_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_interleave_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_interleave_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_interleave_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_as_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rot90_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rot90_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rot90_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rot90_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rot90_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rot90_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rot90_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rot90_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rot90_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_neg_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_neg_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_mean_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_mean_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_searchsorted_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_searchsorted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_bartlett_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_blackman_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_blackman_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_gaussian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_general_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_general_cosine_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signbit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signbit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signbit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signbit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signbit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signbit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signbit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sin_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sparse_mm_reduce_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sparse_sampled_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sparse_sampled_addmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_airy_ai_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_airy_ai_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_airy_ai_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_airy_ai_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_airy_ai_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_v_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_entr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_entr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_entr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_entr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_entr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_entr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_entr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_erfcx_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_erfcx_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_erfcx_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_erfcx_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_h_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_h_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_h_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_h_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_h_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_h_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_he_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_he_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_laguerre_polynomial_l_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_laguerre_polynomial_l_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_laguerre_polynomial_l_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_laguerre_polynomial_l_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_legendre_polynomial_p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_legendre_polynomial_p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_legendre_polynomial_p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_log_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_log_ndtr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtri_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtri_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtri_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtri_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_spherical_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_spherical_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_spherical_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_xlog1py_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_xlog1py_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_xlog1py_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_xlog1py_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_zeta_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_zeta_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_multiple_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_multiple_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_multiple_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_svd_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_svd_lowrank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensor_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensor_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensor_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensor_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensor_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensordot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensordot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensordot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_topk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_topk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_topk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__efficient_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__efficient_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__flash_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapz_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapz_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapz_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapz_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triangular_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_uniform_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_uniform_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_uniform_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_uniform_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_uint64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unravel_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unravel_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unravel_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unravel_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vdot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_xlogy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_xlogy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_xlogy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_xlogy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_xlogy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zero__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zero__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zero__cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zero__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_embedding_bag_byte_prepack_cuda, test/test_meta.py::TestMetaCUDA::test_embedding_bag_byte_unpack_cuda, test/test_meta.py::TestMetaCUDA::test_embedding_bag_dense_backward_mode_2_cuda, test/test_meta.py::TestMetaCUDA::test_empty_quantized_cuda, test/test_meta.py::TestMetaCUDA::test_fill__alias_relationship_cuda, test/test_meta.py::TestMetaCUDA::test_group_norm_backward_output_mask0_cuda, test/test_meta.py::TestMetaCUDA::test_group_norm_backward_output_mask2_cuda, test/test_meta.py::TestMetaCUDA::test_group_norm_backward_output_mask3_cuda, test/test_meta.py::TestMetaCUDA::test_group_norm_backward_output_mask5_cuda, test/test_meta.py::TestMetaCUDA::test_group_norm_backward_output_mask6_cuda, test/test_meta.py::TestMetaCUDA::test_group_norm_backward_output_mask7_cuda, test/test_meta.py::TestMetaCUDA::test_huber_loss_backward_cuda, test/test_meta.py::TestMetaCUDA::test_layer_norm_backward_output_mask2_cuda, test/test_meta.py::TestMetaCUDA::test_layer_norm_backward_output_mask3_cuda, test/test_meta.py::TestMetaCUDA::test_layer_norm_backward_output_mask7_cuda, test/test_meta.py::TestMetaCUDA::test_map_location_deserialize_cuda, test/test_meta.py::TestMetaCUDA::test_meta__fused_moving_avg_obs_fq_helper_cuda, test/test_meta.py::TestMetaCUDA::test_meta_autograd_no_error_cuda, test/test_meta.py::TestMetaCUDA::test_meta_consistency_out_dtype_mismatch_pow_Tensor_Scalar_cuda, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rand___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rand___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmatmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmatmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmatmul___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmul___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmul___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmul___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmul___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___ror___cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace___ror___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___ror___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rpow___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rpow___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rpow___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rpow___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rpow___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rpow___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rxor___cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_atan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_atan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_atan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_div_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_div_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_div_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_div_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_floor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_floor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sub_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_zero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__native_batch_norm_legit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__native_batch_norm_legit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__segment_reduce_lengths_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__segment_reduce_lengths_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__segment_reduce_lengths_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__segment_reduce_offsets_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__segment_reduce_offsets_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__softmax_backward_data_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__softmax_backward_data_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__upsample_bilinear2d_aa_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__upsample_bilinear2d_aa_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__upsample_bilinear2d_aa_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__upsample_bilinear2d_aa_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_abs_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_abs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addbmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcdiv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_decomposed_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_decomposed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_decomposed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_alias_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_alias_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_alias_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_alias_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_alias_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_alias_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_alias_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_alias_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_allclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_allclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_allclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_aminmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_aminmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_aminmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_aminmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_aminmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_arange_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_arange_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_arange_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_arange_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_arange_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argsort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argsort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argsort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argsort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argsort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argsort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asin_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_baddbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_baddbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_baddbmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bernoulli_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bincount_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bincount_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_left_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_left_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_left_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_not_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_not_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_right_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_right_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_right_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bucketize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bucketize_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bucketize_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bucketize_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cholesky_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cholesky_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cholesky_inverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cholesky_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cholesky_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_physical_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_physical_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_physical_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_physical_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_physical_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_physical_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_physical_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_physical_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_contiguous_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_contiguous_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_contiguous_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_contiguous_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_contiguous_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_contiguous_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_contiguous_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_copysign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_copysign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_copysign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_copysign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_copysign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_corrcoef_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_corrcoef_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_corrcoef_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_corrcoef_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_corrcoef_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_corrcoef_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cov_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cov_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cov_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cov_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cross_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_deg2rad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_deg2rad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_deg2rad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_deg2rad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_deg2rad_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_deg2rad_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_deg2rad_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diff_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diff_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diff_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_digamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_digamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_digamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_digamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_digamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_digamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_digamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dist_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dist_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dist_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dist_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_floor_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_floor_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_floor_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_floor_rounding_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_trunc_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_trunc_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_trunc_rounding_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_einsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_einsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_einsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_einsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_einsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfinv_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfinv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfinv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfinv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfinv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exponential_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_float8_e4m3fnuz, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flip_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flip_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flip_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flip_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flip_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flipud_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flipud_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flipud_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flipud_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flipud_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_divide_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_divide_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_frexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_frexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gcd_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gcd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gcd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geometric_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geometric_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geometric_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geqrf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geqrf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gradient_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gradient_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gradient_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gradient_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_grid_sampler_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_grid_sampler_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_grid_sampler_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_histc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hypot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hypot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_i0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_i0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_i0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_i0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_imag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_imag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_imag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_put_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_inner_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_inner_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_inner_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_inner_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isclose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isclose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isneginf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isneginf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isneginf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isneginf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isposinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isposinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isposinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isposinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isposinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isposinf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kthvalue_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kthvalue_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kthvalue_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kthvalue_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lcm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lcm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lcm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lcm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_le_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_le_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_le_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_le_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_le_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_le_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lerp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cholesky_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cholesky_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cholesky_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cholesky_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cholesky_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cond_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cond_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cross_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cross_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cross_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_det_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_det_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eig_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eig_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eigh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eigvals_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eigvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eigvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eigvalsh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eigvalsh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_householder_product_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_inv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_inv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_inv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_inv_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_factor_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lstsq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lstsq_grad_oriented_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lstsq_grad_oriented_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lstsq_grad_oriented_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_factor_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_rank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_rank_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_rank_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_rank_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_multi_dot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_multi_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_subgradients_at_zero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_subgradients_at_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_pinv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_pinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_pinv_hermitian_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_pinv_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_qr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_qr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_slogdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_slogdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_slogdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_triangular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_triangular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_triangular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_svd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_svdvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_tensorinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_tensorinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_tensorsolve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_tensorsolve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vecdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vecdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vecdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vecdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vecdot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vector_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vector_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vector_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log1p_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log1p_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log1p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_normal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logaddexp2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logaddexp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logcumsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logcumsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logcumsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lu_unpack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lu_unpack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_log_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matrix_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matrix_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matrix_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_pool2d_with_indices_backward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_median_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_median_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_with_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mode_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_multinomial_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_multinomial_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_multinomial_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nan_to_num_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nan_to_num_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nan_to_num_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanquantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanquantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_batch_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_dropout_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_dropout_backward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ne_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ne_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ne_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ne_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ne_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nextafter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nextafter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_max_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_alpha_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_alpha_dropout_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_alpha_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_batch_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_celu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_similarity_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_similarity_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cross_entropy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_ctc_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_dropout2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_dropout2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_dropout2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_dropout3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_dropout3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_elu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_elu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_elu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_embedding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_fractional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_fractional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_fractional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_gaussian_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_gelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_gelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_gelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_glu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_glu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_grid_sample_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_grid_sample_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_grid_sample_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_group_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_group_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardsigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardswish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardswish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardtanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardtanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardtanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hinge_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_huber_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_huber_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_instance_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_instance_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_instance_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_instance_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_area_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_area_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_area_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_bicubic_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_bicubic_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_linear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_trilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_kl_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_kl_div_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_l1_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_leaky_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_leaky_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_leaky_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_linear_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_linear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_logsigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_margin_ranking_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_margin_ranking_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_margin_ranking_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_margin_ranking_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool1d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool2d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool3d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool3d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_mish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_mish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_mish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multi_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multi_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multilabel_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_reflect_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_reflect_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_reflect_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_reflect_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_reflect_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_reflect_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_reflect_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_reflect_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_poisson_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_poisson_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_poisson_nll_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_poisson_nll_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_prelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_prelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu6_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu6_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_rms_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_rms_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_rrelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_rrelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_selu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_selu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_silu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_smooth_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_smooth_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softplus_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softplus_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_tanhshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_tanhshrink_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_threshold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_threshold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_threshold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_threshold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_threshold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_upsample_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_upsample_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_fro_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_fro_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_inf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_inf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_inf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_inf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_in_place_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_in_place_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_number_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_number_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ormqr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ormqr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ormqr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pca_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pinverse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polar_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_put_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_qr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rad2deg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rad2deg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rad2deg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rad2deg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rad2deg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rand_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rand_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rand_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rand_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_remainder_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_remainder_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_remainder_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_remainder_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_renorm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_renorm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_renorm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_interleave_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_interleave_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_interleave_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_interleave_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_roll_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_roll_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_roll_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_roll_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_roll_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_roll_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_roll_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_roll_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_roll_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_roll_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_neg_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_neg_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_neg_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_mean_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sgn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sgn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sgn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sgn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sgn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sgn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_short_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_short_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_short_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_short_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_short_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_bartlett_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_cosine_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_gaussian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_general_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_hann_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_hann_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signbit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signbit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signbit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signbit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sparse_mm_reduce_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sparse_mm_reduce_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sparse_mm_reduce_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sparse_sampled_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_airy_ai_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_airy_ai_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_airy_ai_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_airy_ai_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_v_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_erfcx_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_erfcx_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_erfcx_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_erfcx_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_h_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_h_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_h_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_h_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_he_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_he_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_he_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i0e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i0e_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i0e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i0e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i0e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i0e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i0e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_laguerre_polynomial_l_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_legendre_polynomial_p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_legendre_polynomial_p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_legendre_polynomial_p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_legendre_polynomial_p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_log_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_log_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_log_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtri_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtri_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtri_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_spherical_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_spherical_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_xlog1py_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_xlog1py_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_xlog1py_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_zeta_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_zeta_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_zeta_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_zeta_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_zeta_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_zeta_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_multiple_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_multiple_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_multiple_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_multiple_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_multiple_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_multiple_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_multiple_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_multiple_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_mean_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_to_size_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_to_size_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_to_size_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_to_size_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_svd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_svd_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_svd_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensordot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensordot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensordot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_topk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_topk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_topk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__efficient_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__efficient_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__flash_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapezoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triangular_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triangular_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_true_divide_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_true_divide_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_true_divide_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_true_divide_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_true_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_true_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_true_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_uniform_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_uniform_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unravel_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unravel_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unravel_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_chunk_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_T_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_T_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_T_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_T_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_T_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rand___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rand___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmatmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmatmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmatmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmatmul___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmod___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmod___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmod___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmod___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmod___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___ror___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___ror___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___ror___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rpow___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rpow___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rpow___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rpow___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rpow___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rpow___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rpow___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rpow___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rxor___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rxor___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__batch_norm_with_update_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__batch_norm_with_update_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__chunk_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__chunk_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_acos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_acos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcdiv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcdiv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcdiv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erfc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_minimum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_minimum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sigmoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sigmoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sigmoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_zero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__native_batch_norm_legit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__native_batch_norm_legit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__segment_reduce_offsets_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__softmax_backward_data_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__softmax_backward_data_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addbmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmm_decomposed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmm_decomposed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmm_decomposed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_allclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_allclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_allclose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_allclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_allclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_allclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asin_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_baddbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_baddbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_baddbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bernoulli_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bernoulli_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bfloat16_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bfloat16_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bfloat16_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bfloat16_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bfloat16_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bfloat16_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bfloat16_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_left_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_left_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_left_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_left_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_not_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_not_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bucketize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bucketize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bucketize_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bucketize_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bucketize_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cauchy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cauchy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cauchy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ceil_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chalf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chalf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chalf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chalf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chalf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chalf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_inverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumsum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumulative_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumulative_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumulative_trapezoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumulative_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumulative_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumulative_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumulative_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumulative_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_floor_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_floor_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_floor_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_trunc_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_trunc_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_trunc_rounding_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_trunc_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_einsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_einsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_permuted_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_permuted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eq_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eq_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eq_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_equal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_equal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_equal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_equal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_equal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_equal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exponential_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float8_e5m2fnuz, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flip_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flip_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flip_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flip_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flip_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_divide_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_divide_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_divide_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_frexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_frexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gcd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ge_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ge_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ge_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ge_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ge_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geometric_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geometric_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geometric_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geometric_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geqrf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geqrf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geqrf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gradient_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gradient_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gradient_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gradient_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_grid_sampler_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_grid_sampler_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_grid_sampler_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_grid_sampler_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_grid_sampler_3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_grid_sampler_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hash_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hash_tensor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hash_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hash_tensor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hash_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_heaviside_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_heaviside_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_heaviside_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_heaviside_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_heaviside_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_histc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_histc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hypot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_i0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_igammac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_imag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_inner_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_inner_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_inner_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isposinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isposinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isposinf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isreal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isreal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isreal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isreal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isreal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isreal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_istft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_2inputs_2outputs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_2inputs_2outputs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_2inputs_2outputs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_2inputs_2outputs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_2inputs_2outputs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kron_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kron_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kron_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kron_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kron_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kron_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kthvalue_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kthvalue_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kthvalue_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kthvalue_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kthvalue_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kthvalue_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lcm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lcm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lcm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cholesky_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cholesky_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cholesky_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cond_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_det_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigvals_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigvalsh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigvalsh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_inv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_inv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_inv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_inv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_inv_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_inv_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_inv_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lstsq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lstsq_grad_oriented_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lstsq_grad_oriented_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lstsq_grad_oriented_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_factor_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_power_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_rank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_rank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_rank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_rank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_rank_hermitian_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_rank_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_rank_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_multi_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_multi_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_multi_dot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_subgradients_at_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_subgradients_at_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_qr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_slogdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_slogdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_triangular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_triangular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_triangular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_svd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_svdvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_svdvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_tensorinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_tensorinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_tensorinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_tensorsolve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vander_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vander_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vander_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vander_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vecdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vecdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vecdot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vector_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vector_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vector_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log1p_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log1p_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log1p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_normal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logcumsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logcumsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logcumsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logcumsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_xor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_unpack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumsum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumsum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_median_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_softmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_std_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_std_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_var_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_var_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_var_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_matmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_matrix_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_maximum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_multinomial_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_multinomial_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nan_to_num_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nan_to_num_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nan_to_num_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nan_to_num_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmedian_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmedian_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmedian_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanquantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_batch_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_dropout_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_dropout_backward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nextafter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nextafter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_alpha_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_avg_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_batch_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_batch_norm_without_cudnn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_batch_norm_without_cudnn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_binary_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_binary_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_binary_cross_entropy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_celu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_similarity_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_similarity_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cross_entropy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_ctc_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_ctc_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_elu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_embedding_bag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_embedding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_with_train_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_fractional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_fractional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_fractional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_gaussian_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_gaussian_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_gelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_gelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_glu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_glu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_glu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_grid_sample_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_group_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_group_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardswish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardswish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hinge_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hinge_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_huber_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_huber_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_huber_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_instance_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_instance_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_area_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_area_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_bicubic_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_bicubic_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_linear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_nearest_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_trilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_kl_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_kl_div_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_l1_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_leaky_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_leaky_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_leaky_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_linear_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_linear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_local_response_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_local_response_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_margin_ranking_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_margin_ranking_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_margin_ranking_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool1d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool1d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool2d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_mish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_mish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_mse_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_mse_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multi_head_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multi_head_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multi_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multilabel_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multilabel_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multilabel_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pairwise_distance_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pairwise_distance_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pairwise_distance_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_prelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_prelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu6_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu6_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rms_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rms_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rrelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_selu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_selu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_selu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_silu_complex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_silu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_silu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_silu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softplus_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softplus_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_threshold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_threshold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_threshold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_threshold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_threshold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_upsample_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_upsample_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_upsample_nearest_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_inf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_inf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_nuc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_nuc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_in_place_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_in_place_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_in_place_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_number_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_number_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ormqr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ormqr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ormqr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pca_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pca_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pinverse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pinverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_4_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_4_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_4_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_4_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_4_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_4_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_put_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_put_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rad2deg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rad2deg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rad2deg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rand_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rand_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rand_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rand_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_remainder_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_remainder_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_remainder_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_renorm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_renorm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_renorm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_renorm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_neg_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_neg_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_mean_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_searchsorted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_searchsorted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_searchsorted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_searchsorted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_searchsorted_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_searchsorted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_bartlett_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_blackman_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_gaussian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_general_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_hann_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signbit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signbit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signbit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signbit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_mm_reduce_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_mm_reduce_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_sampled_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_sampled_addmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_sampled_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_sampled_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_airy_ai_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_airy_ai_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_airy_ai_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_airy_ai_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_v_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_erfcx_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_erfcx_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_erfcx_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_erfcx_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_erfcx_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_erfcx_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_h_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_h_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_h_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_he_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_he_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_laguerre_polynomial_l_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_laguerre_polynomial_l_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_laguerre_polynomial_l_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_laguerre_polynomial_l_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_legendre_polynomial_p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_legendre_polynomial_p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_legendre_polynomial_p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_legendre_polynomial_p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_log_ndtr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_log_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtri_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtri_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtri_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtri_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtri_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtri_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtri_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_spherical_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_spherical_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_spherical_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_spherical_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_spherical_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_zeta_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_zeta_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_zeta_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_zeta_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_mean_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_mean_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_svd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_along_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_along_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_along_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_along_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_along_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_along_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_along_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensordot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensordot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensordot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_topk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_topk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_topk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_topk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_topk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_topk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch__scaled_mm_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__efficient_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triangular_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triangular_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_true_divide_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_true_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_true_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_true_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_true_divide_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_true_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_true_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trunc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_uniform_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_uniform_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_uniform_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unravel_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unravel_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unravel_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_split_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_mean_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vdot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_nonzero_cuda, test/test_meta.py::TestMetaCUDA::test_segment_reduce_backward_cuda, test/test_meta.py::TestMetaCUDA::test_triangular_solve_out_cuda 2025-09-07T09:03:23.1938645Z 2025-09-07T09:03:23.1938737Z Running test_numa_binding 1/1 ... [2025-09-07 09:03:22.744933] 2025-09-07T09:03:23.1938902Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:03:23.1939283Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_numa_binding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 09:03:22.745151] 2025-09-07T09:03:25.9168666Z 2025-09-07T09:03:25.9169726Z test_numa_binding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_numa_binding_1.1_b6cfe53cc8cc16c8_.log 2025-09-07T09:03:25.9176106Z Running 19 items in this shard: test/test_numa_binding.py::NumaBindingTest::test_binds_to_node_0_if_node_stored_as_minus_one, test/test_numa_binding.py::NumaBindingTest::test_callable_entrypoint_basic, test/test_numa_binding.py::NumaBindingTest::test_core_complex_numa_binding_with_extra_l3, test/test_numa_binding.py::NumaBindingTest::test_core_complex_numa_binding_with_fewer_l3_than_gpu, test/test_numa_binding.py::NumaBindingTest::test_core_complex_prefers_caches_with_more_cpus, test/test_numa_binding.py::NumaBindingTest::test_core_complex_tiebreak_prefers_lower_cache_key, test/test_numa_binding.py::NumaBindingTest::test_default_numa_binding, test/test_numa_binding.py::NumaBindingTest::test_exclusive_numa_binding, test/test_numa_binding.py::NumaBindingTest::test_exclusive_raises_if_too_few_physical_cores, test/test_numa_binding.py::NumaBindingTest::test_explicit_numa_options_overrides_default, test/test_numa_binding.py::NumaBindingTest::test_fallback, test/test_numa_binding.py::NumaBindingTest::test_get_range_str_from_ints, test/test_numa_binding.py::NumaBindingTest::test_get_set_of_int_from_ranges_str, test/test_numa_binding.py::NumaBindingTest::test_no_numa_binding_if_numa_options_not_provided, test/test_numa_binding.py::NumaBindingTest::test_node_numa_binding, test/test_numa_binding.py::NumaBindingTest::test_nproc_must_equal_cuda_device_count_to_use_default_numa_options, test/test_numa_binding.py::NumaBindingTest::test_raises_if_binding_to_empty_set, test/test_numa_binding.py::NumaBindingTest::test_socket_numa_binding_with_multiple_numa_per_socket, test/test_numa_binding.py::NumaBindingTest::test_socket_numa_binding_with_single_numa_per_socket 2025-09-07T09:03:25.9187278Z 2025-09-07T09:03:25.9187412Z Running test_numba_integration 1/1 ... [2025-09-07 09:03:25.916962] 2025-09-07T09:03:25.9187647Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:03:25.9188206Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_numba_integration.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 09:03:25.917242] 2025-09-07T09:03:28.0363119Z 2025-09-07T09:03:28.0364340Z test_numba_integration 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_numba_integration_1.1_c4fa1122a72b275c_.log 2025-09-07T09:03:28.0367910Z Running 8 items in this shard: test/test_numba_integration.py::TestNumbaIntegration::test_active_device, test/test_numba_integration.py::TestNumbaIntegration::test_array_adaptor, test/test_numba_integration.py::TestNumbaIntegration::test_conversion_errors, test/test_numba_integration.py::TestNumbaIntegration::test_cuda_array_interface, test/test_numba_integration.py::TestNumbaIntegration::test_from_cuda_array_interface, test/test_numba_integration.py::TestNumbaIntegration::test_from_cuda_array_interface_active_device, test/test_numba_integration.py::TestNumbaIntegration::test_from_cuda_array_interface_inferred_strides, test/test_numba_integration.py::TestNumbaIntegration::test_from_cuda_array_interface_lifetime 2025-09-07T09:03:28.0370335Z 2025-09-07T09:03:28.0370538Z Running test_numpy_interop 1/1 ... [2025-09-07 09:03:28.036214] 2025-09-07T09:03:28.0370934Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:03:28.0380596Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_numpy_interop.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 09:03:28.036537] 2025-09-07T09:03:30.6069339Z 2025-09-07T09:03:30.6070717Z test_numpy_interop 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_numpy_interop_1.1_94f1b54c34715905_.log 2025-09-07T09:03:30.6078849Z Running 44 items in this shard: test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_bool, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_complex128, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_complex64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_float16, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_float32, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_float64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_int16, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_int32, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_int64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_int8, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_uint8, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_ctor_with_invalid_numpy_array_sequence_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_ctor_with_numpy_scalar_ctor_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_empty_tensors_interop_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_from_list_of_ndarray_warning_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_from_numpy_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_from_numpy_no_leak_on_invalid_dtype_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_from_numpy_zero_element_type_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_has_storage_numpy_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_multiplication_numpy_scalar_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_ndarray_astype_object_graph_break_2_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_ndarray_astype_object_graph_break_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_array_interface_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_index_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_index_multi_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_non_writeable_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_bfloat16, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_bool, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_complex128, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_complex64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_float16, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_float32, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_float64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_int16, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_int32, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_int64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_int8, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_uint8, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_unresizable_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_parse_numpy_int_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_parse_numpy_int_overflow_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_to_numpy_bool_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_to_numpy_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_to_numpy_force_argument_cuda 2025-09-07T09:03:30.6083755Z 2025-09-07T09:03:30.6083841Z Running test_openmp 1/1 ... [2025-09-07 09:03:30.607044] 2025-09-07T09:03:30.6083993Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:03:30.6084362Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_openmp.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 09:03:30.607351] 2025-09-07T09:03:34.0792939Z 2025-09-07T09:03:34.0793968Z test_openmp 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_openmp_1.1_599abeb540462b79_.log 2025-09-07T09:03:34.0795203Z Running 2 items in this shard: test/test_openmp.py::TestOpenMP_ParallelFor::test_n_threads, test/test_openmp.py::TestOpenMP_ParallelFor::test_one_thread 2025-09-07T09:03:34.0795851Z 2025-09-07T09:03:34.0796059Z Running test_openreg 1/1 ... [2025-09-07 09:03:34.079221] 2025-09-07T09:03:34.2243512Z Processing /var/lib/jenkins/pytorch/test/cpp_extensions/open_registration_extension/torch_openreg 2025-09-07T09:03:34.3428646Z Preparing metadata (pyproject.toml) ... [?25l- done 2025-09-07T09:03:34.3438290Z [?25hRequirement already satisfied: torch in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch_openreg==0.0.1) (2.9.0a0+git93fb23d) 2025-09-07T09:03:34.3444562Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch->torch_openreg==0.0.1) (3.19.1) 2025-09-07T09:03:34.3453722Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch->torch_openreg==0.0.1) (4.15.0) 2025-09-07T09:03:34.3454768Z Requirement already satisfied: setuptools in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch->torch_openreg==0.0.1) (80.9.0) 2025-09-07T09:03:34.3455465Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch->torch_openreg==0.0.1) (1.13.3) 2025-09-07T09:03:34.3456070Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch->torch_openreg==0.0.1) (2.8.8) 2025-09-07T09:03:34.3456798Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch->torch_openreg==0.0.1) (3.1.6) 2025-09-07T09:03:34.3457399Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch->torch_openreg==0.0.1) (2025.7.0) 2025-09-07T09:03:34.3504485Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from sympy>=1.13.3->torch->torch_openreg==0.0.1) (1.3.0) 2025-09-07T09:03:34.3518193Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from jinja2->torch->torch_openreg==0.0.1) (3.0.2) 2025-09-07T09:03:34.3549622Z Building wheels for collected packages: torch_openreg 2025-09-07T09:03:50.4490490Z Building wheel for torch_openreg (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / done 2025-09-07T09:03:50.4496317Z [?25h Created wheel for torch_openreg: filename=torch_openreg-0.0.1-cp312-cp312-linux_x86_64.whl size=289803 sha256=e7d553185565ca52642fd96a3bc1c83ef27b24d56ed3955bf13ad80e9f4484d7 2025-09-07T09:03:50.4497317Z Stored in directory: /tmp/pip-ephem-wheel-cache-2xptv12d/wheels/77/0e/44/354821851da27875d30edab8573104d155f90dd45b52779aa6 2025-09-07T09:03:50.4511089Z Successfully built torch_openreg 2025-09-07T09:03:50.5279479Z Installing collected packages: torch_openreg 2025-09-07T09:03:50.5389485Z Successfully installed torch_openreg-0.0.1 2025-09-07T09:03:50.5670859Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:03:50.5675704Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_openreg.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 09:03:50.567069] 2025-09-07T09:03:53.6887113Z 2025-09-07T09:03:53.6888203Z test_openreg 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_openreg_1.1_ae27c3647504b105_.log 2025-09-07T09:03:53.6896308Z Running 44 items in this shard: test/test_openreg.py::TestPrivateUse1::test_backend_dispatchstub, test/test_openreg.py::TestPrivateUse1::test_backend_generate_methods, test/test_openreg.py::TestPrivateUse1::test_backend_module_function, test/test_openreg.py::TestPrivateUse1::test_backend_module_methods, test/test_openreg.py::TestPrivateUse1::test_backend_module_registration, test/test_openreg.py::TestPrivateUse1::test_backend_name, test/test_openreg.py::TestPrivateUse1::test_backend_operator_registration, test/test_openreg.py::TestPrivateUse1::test_backend_packed_sequence_methods, test/test_openreg.py::TestPrivateUse1::test_backend_storage_methods, test/test_openreg.py::TestPrivateUse1::test_backend_tensor_methods, test/test_openreg.py::TestPrivateUse1::test_backend_tensor_type, test/test_openreg.py::TestPrivateUse1::test_backend_type_methods, test/test_openreg.py::TestOpenReg::test_autograd_init, test/test_openreg.py::TestOpenReg::test_compile_autograd_function_aliasing, test/test_openreg.py::TestOpenReg::test_compile_autograd_function_returns_self, test/test_openreg.py::TestOpenReg::test_copy_same_device, test/test_openreg.py::TestOpenReg::test_cross_device_copy, test/test_openreg.py::TestOpenReg::test_cross_diff_devices_copy, test/test_openreg.py::TestOpenReg::test_data_dependent_output, test/test_openreg.py::TestOpenReg::test_event_elapsed_time, test/test_openreg.py::TestOpenReg::test_event_wait_stream, test/test_openreg.py::TestOpenReg::test_expand, test/test_openreg.py::TestOpenReg::test_factory, test/test_openreg.py::TestOpenReg::test_fake_tensor, test/test_openreg.py::TestOpenReg::test_generator, test/test_openreg.py::TestOpenReg::test_manual_seed, test/test_openreg.py::TestOpenReg::test_named_tensor, test/test_openreg.py::TestOpenReg::test_open_device_cpu_serialization, test/test_openreg.py::TestOpenReg::test_open_device_dlpack, test/test_openreg.py::TestOpenReg::test_open_device_numpy_serialization, test/test_openreg.py::TestOpenReg::test_pin_memory, test/test_openreg.py::TestOpenReg::test_printing, test/test_openreg.py::TestOpenReg::test_quantize, test/test_openreg.py::TestOpenReg::test_record_event, test/test_openreg.py::TestOpenReg::test_resize, test/test_openreg.py::TestOpenReg::test_rewrapped_storage, test/test_openreg.py::TestOpenReg::test_rng_state, test/test_openreg.py::TestOpenReg::test_scalar_type_fallback, test/test_openreg.py::TestOpenReg::test_serialization, test/test_openreg.py::TestOpenReg::test_stream_synchronize, test/test_openreg.py::TestOpenReg::test_stream_wait_event, test/test_openreg.py::TestOpenReg::test_stream_wait_stream, test/test_openreg.py::TestOpenReg::test_tensor_type_fallback, test/test_openreg.py::TestOpenReg::test_tensorlist_type_fallback 2025-09-07T09:03:53.6901428Z 2025-09-07T09:03:53.6901516Z Running test_ops 1/4 ... [2025-09-07 09:03:53.688750] 2025-09-07T09:03:53.6901719Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:03:53.6902208Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_ops.py', '--shard-id=1', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 09:03:53.689044] 2025-09-07T09:12:28.5611221Z 2025-09-07T09:12:28.5616437Z test_ops 1/4 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_1.4_841d671cc2f267a7_.log 2025-09-07T09:12:28.6636928Z Running 8491 items in this shard: test/test_ops.py::TestCommonCUDA::test_compare_cpu_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rmod___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rxor___cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_permute_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_baddbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bincount_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cummin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_empty_permuted_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_geqrf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_det_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mH_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_native_dropout_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_normal_in_place_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_permute_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_split_list_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_std_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_transpose_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_trapz_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unique_consecutive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing___getitem___cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing__chunk_cat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_abs_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_acosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_alias_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_as_strided_scatter_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_asinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atleast_1d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atleast_2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_bfloat16_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_block_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cdouble_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_contiguous_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_hfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_irfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_imag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_isreal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_long_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_mH_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nanmean_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv_transpose1d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_repeat_interleave_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_split_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_squeeze_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_tanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_transpose_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_tril_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_triu_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unfold_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_vstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_zeros_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_dtypes___radd___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rand___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rmod___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_chalf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_double_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_short_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_acosh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_allclose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_arange_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_asinh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atleast_3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_cat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_conj_physical_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_constant_pad_nd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_cos_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_equal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_erfc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_as_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_expm1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fftshift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_i0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_igamma_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isinf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isnan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isneginf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_istft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_lgamma_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linspace_tensor_overload_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_not_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_mul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nan_to_num_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_ne_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_neg_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_empty_strided_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_full_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_elu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_hardtanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_pixel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_pixel_unshuffle_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softplus_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_normal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_roll_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_rsqrt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sign_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_entr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_erfcx_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_multigammaln_mvlgamma_p_1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_multigammaln_mvlgamma_p_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_ndtri_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_spherical_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_zeta_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_take_along_dim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_tan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_to_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_triu_indices_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_add_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addcmul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_alias_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_arange_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_argmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_argsort_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_as_strided_partial_views_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_asinh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atleast_2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bool_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_broadcast_shapes_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bucketize_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cartesian_prod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ceil_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_combinations_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_conj_physical_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_count_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cross_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cummax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_dot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_empty_permuted_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_equal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_expand_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_fftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ifftshift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_frac_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_full_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_full_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_geometric_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_geqrf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_gradient_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_grid_sampler_3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_hash_tensor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isfinite_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isnan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isneginf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_jiterator_binary_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_jiterator_binary_return_by_ref_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_jiterator_unary_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_kron_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_kthvalue_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_cholesky_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_det_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_eig_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_eigvalsh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lstsq_grad_oriented_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_factor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_rank_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_vander_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_vecdot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_vector_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logaddexp2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logcumsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lu_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_log_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_softmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_matrix_exp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_meshgrid_variadic_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_min_binary_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_msort_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nanmean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_native_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_new_full_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_new_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_alpha_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_avg_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_binary_cross_entropy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_binary_cross_entropy_with_logits_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_channel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv_transpose2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv_transpose3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cosine_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cross_entropy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_dropout2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_fractional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardtanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_bicubic_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_linear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_leaky_relu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool3d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_mish_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pad_reflect_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pairwise_distance_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_silu_complex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_smooth_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_norm_inf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ones_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_pca_lowrank_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_permute_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_pinverse_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_qr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_quantile_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_resolve_conj_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scalar_tensor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_select_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_short_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_bartlett_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_general_cosine_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_general_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_hann_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sort_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_j1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_erfcx_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_i1e_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_laguerre_polynomial_l_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_log_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_i1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_shifted_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sqrt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_square_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_std_mean_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_take_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tensor_split_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_to_sparse_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tril_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_uniform_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unsqueeze_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unsqueeze_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_where_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_zero__cuda, test/test_ops.py::TestCommonCUDA::test_errors___radd___cuda, test/test_ops.py::TestCommonCUDA::test_errors___rdiv___cuda, test/test_ops.py::TestCommonCUDA::test_errors___rmul___cuda, test/test_ops.py::TestCommonCUDA::test_errors___rsub___cuda, test/test_ops.py::TestCommonCUDA::test_errors___rxor___cuda, test/test_ops.py::TestCommonCUDA::test_errors_bernoulli_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bucketize_cuda, test/test_ops.py::TestCommonCUDA::test_errors_complex_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_errors_dot_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_errors_gather_cuda, test/test_ops.py::TestCommonCUDA::test_errors_gradient_cuda, test/test_ops.py::TestCommonCUDA::test_errors_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_errors_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_errors_isclose_cuda, test/test_ops.py::TestCommonCUDA::test_errors_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_errors_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_errors_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_errors_masked_select_cuda, test/test_ops.py::TestCommonCUDA::test_errors_mean_cuda, test/test_ops.py::TestCommonCUDA::test_errors_median_cuda, test/test_ops.py::TestCommonCUDA::test_errors_narrow_copy_cuda, test/test_ops.py::TestCommonCUDA::test_errors_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_adaptive_avg_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_adaptive_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_avg_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_poisson_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_rms_norm_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_triplet_margin_with_distance_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_pow_cuda, test/test_ops.py::TestCommonCUDA::test_errors_renorm_cuda, test/test_ops.py::TestCommonCUDA::test_errors_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_errors_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_errors_scatter_add_cuda, test/test_ops.py::TestCommonCUDA::test_errors_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_kaiser_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_nuttall_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_mul_layout1_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_mul_layout2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_randn_like_layout4_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_sum_layout1_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_zeros_like_layout1_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_zeros_like_layout4_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_legendre_polynomial_p_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_shifted_chebyshev_polynomial_t_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_shifted_chebyshev_polynomial_w_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_errors_t_cuda, test/test_ops.py::TestCommonCUDA::test_errors_tril_cuda, test/test_ops.py::TestCommonCUDA::test_errors_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_errors_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_errors_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_errors_view_copy_cuda, test/test_ops.py::TestCommonCUDA::test_errors_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_aminmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_index_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_det_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_masked_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_permute_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_airy_ai_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_view_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rdiv___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rmatmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__softmax_backward_data_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__unsafe_masked_index_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__unsafe_masked_index_put_accumulate_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_acos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_aminmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_any_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_arange_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argwhere_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bucketize_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cfloat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_max_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_combinations_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cov_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_deg2rad_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_div_trunc_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_einsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_empty_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_eq_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_equal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_exp2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expand_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gradient_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_histc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isposinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_4inputs_with_extra_args_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_return_by_ref_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_var_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_no_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_minimum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nan_to_num_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nansum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_replicate_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu6_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_silu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_threshold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nonzero_static_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ones_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_permute_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_put_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rand_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_randint_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_remainder_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_round_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_round_decimals_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rsqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_slice_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_slice_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_airy_ai_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_w_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_h_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_he_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_spherical_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_xlog1py_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_with_sizes_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_squeeze_multiple_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_take_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tile_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_to_sparse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_transpose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trapz_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tril_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trunc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unbind_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unique_consecutive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unsafe_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_xlogy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___getitem___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rand___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rxor___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values__chunk_cat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values__unsafe_masked_index_put_accumulate_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_aminmax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_argwhere_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_as_strided_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bitwise_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cdouble_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_column_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_combinations_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_conj_physical_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_copysign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_digamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_div_no_rounding_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_double_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_empty_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_expand_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_expm1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_hfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_hfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ihfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_flip_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_float_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_gather_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_heaviside_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isclose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isfinite_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isnan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isreal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_item_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_kron_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logsumexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_long_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_lt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_max_binary_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_meshgrid_variadic_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_binary_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_reduction_no_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_reduction_with_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nansum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ne_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_empty_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_cosine_embedding_loss_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pad_constant_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ones_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_outer_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_permute_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_permute_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_put_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ravel_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resize__cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resolve_conj_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resolve_neg_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_roll_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_select_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sgn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_y1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_chebyshev_polynomial_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_entr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_hermite_polynomial_h_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_hermite_polynomial_he_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i0e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_polygamma_special_polygamma_n_0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_spherical_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_squeeze_multiple_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_t_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tensor_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_trace_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_transpose_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_triu_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_vsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_T_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___getitem___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmod___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rpow___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rpow___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__chunk_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__unsafe_masked_index_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_abs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_alias_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_alias_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_aminmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_any_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_arange_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_3d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bool_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_byte_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chalf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_inverse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_column_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_combinations_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_physical_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cov_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumulative_trapezoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumulative_trapezoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_embed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dist_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_einsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_einsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_permuted_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_power_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gradient_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hash_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_histc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_reduce_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_inner_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isfinite_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isnan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_return_by_ref_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_return_by_ref_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_unary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ldexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lerp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eig_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_householder_product_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_multi_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_triangular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorinv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vecdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logcumsumexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_not_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_unpack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mH_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_normalize_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_matrix_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_msort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nanmean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nanmedian_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nansum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_channel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_circular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_constant_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_reflect_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_negative_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_negative_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_silu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_static_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ormqr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_outer_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pca_lowrank_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_put_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_remainder_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_interleave_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize_as__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_roll_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scalar_tensor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_searchsorted_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_hann_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signbit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_airy_ai_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_airy_ai_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_laguerre_polynomial_l_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_log_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_multiple_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_multiple_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_along_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_sparse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triangular_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tril_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unflatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unflatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_uniform_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unique_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsafe_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_addbmm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_allclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_aminmax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diagflat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diff_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_item_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_2inputs_2outputs_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_4inputs_with_extra_args_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorinv_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorinv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose1d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose2d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_permute_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_repeat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_searchsorted_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_searchsorted_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_cosine_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_exponential_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_general_hamming_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tile_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tile_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_transpose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_unbind_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_unravel_index_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___rxor___cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_cauchy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_frexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_item_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_t_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diagflat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_geometric_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_hash_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_item_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_min_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_silu_complex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_norm_nuc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_randint_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addmm_decomposed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addmv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_alias_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_angle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_bmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cholesky_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cholesky_inverse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cholesky_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cummin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_diff_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_fftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_hfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_float_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_index_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_index_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_index_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_inner_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_kron_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_cond_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_det_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_eig_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_inv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_inv_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_lstsq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_solve_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_solve_triangular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_svdvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_tensorinv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_vecdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log1p_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logcumsumexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_masked_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_norm_inf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_norm_inf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_renorm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sparse_sampled_addmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_square_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_t_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_take_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_transpose_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_triangular_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_vdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_resize_as__cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_resolve_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_round_decimals_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_hann_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_slice_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_squeeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_squeeze_multiple_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_std_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_svd_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unique_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_warning___rdiv___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_T_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_bfloat16_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_byte_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_double_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_alias_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_all_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_as_strided_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atleast_3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_broadcast_to_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_cat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_clamp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_column_stack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_count_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_deg2rad_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_empty_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_expand_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_expand_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_hfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_gt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_heaviside_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isinf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isreal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_le_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_lgamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linspace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_log_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logaddexp2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_lt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_neg_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_celu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_smooth_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_normal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_normal_number_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_ones_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_repeat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_round_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_rsqrt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sgn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sign_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_i1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_log_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_ndtri_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_split_with_sizes_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_squeeze_multiple_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_stft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_t_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_tensor_split_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_trace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_tril_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_triu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_triu_indices_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__softmax_backward_data_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_acosh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addmm_decomposed_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addmv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_alias_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_aminmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_as_strided_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_as_strided_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_asin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atleast_1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atleast_3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_broadcast_shapes_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_broadcast_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cartesian_prod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cauchy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cdouble_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_chalf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cholesky_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cos_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cummax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_deg2rad_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_dot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_exp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_expand_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_expm1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_fftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_geqrf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_gt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_hash_tensor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_reduce_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isfinite_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_item_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_jiterator_2inputs_2outputs_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_det_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_eigvals_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_eigvalsh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lstsq_grad_oriented_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_factor_ex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_rank_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_pinv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_pinv_hermitian_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_svd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_svdvals_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linspace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log_normal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logcumsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logspace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_long_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_amax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_std_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_sum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_max_pool2d_with_indices_backward_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_max_reduction_with_dim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_maximum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_min_reduction_no_dim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_multinomial_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mvlgamma_mvlgamma_p_3_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nanquantile_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nansum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_native_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_channel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_dropout3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_embedding_bag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_fractional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_glu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_grid_sample_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_linear_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_logsigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool2d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_mse_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_one_hot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pixel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_rrelu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_scaled_dot_product_attention_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_selu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_smooth_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_soft_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_triplet_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_upsample_nearest_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_norm_nuc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_normal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_outer_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_permute_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_4_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_randn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_resize_as__cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_roll_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_round_decimals_3_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rsqrt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_short_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_blackman_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_cosine_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_gaussian_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_general_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_slice_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_y0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_w_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_erfcx_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_hermite_polynomial_h_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_i0e_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_i1e_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_log_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_i0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_shifted_chebyshev_polynomial_w_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_split_with_sizes_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sqrt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_squeeze_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_squeeze_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_take_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tanh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tile_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_torch__scaled_mm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_torch_ops_aten__efficient_attention_forward_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_trace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_transpose_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_trapezoid_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_triangular_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tril_indices_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_triu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unflatten_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unfold_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_uniform_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unique_consecutive_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_var_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_view_as_complex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_view_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_where_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_out_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float___rdiv___cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float___rdiv___cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_copysign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_digamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfinv_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfinv_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfinv_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_expm1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_float_power_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_float_power_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_lgamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log10_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log10_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_masked_std_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_masked_var_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_3_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_3_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_4_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_4_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_4_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rad2deg_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rad2deg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_reciprocal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_reciprocal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_reciprocal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rsqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sigmoid_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_t_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_t_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_u_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_u_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_v_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_w_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_h_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_h_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_he_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_he_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_he_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_laguerre_polynomial_l_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_laguerre_polynomial_l_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_laguerre_polynomial_l_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_laguerre_polynomial_l_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_laguerre_polynomial_l_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_xlog1py_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_zeta_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_zeta_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sqrt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_true_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_xlogy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_complex_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_frexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_geometric_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_imag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_istft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_cross_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_cross_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_cross_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_cross_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vecdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vecdot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_normal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_tensor_overload_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_gelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_gelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_smooth_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_normal__in_place_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_normal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_normal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_renorm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_select_scatter_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_split_with_sizes_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_split_with_sizes_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_split_with_sizes_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vdot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_complex_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_ge_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_gt_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_igamma_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_linalg_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_masked_fill_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_mul_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_reshape_as_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_sub_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_triu_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_unbind_copy_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_polar_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cauchy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumprod_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumprod_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumprod_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumprod_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumprod_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumprod_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_deg2rad_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_deg2rad_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_deg2rad_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dot_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exponential_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exponential_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float8_e5m2, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frexp_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frexp_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_geometric_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_geometric_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_geometric_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_item_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_item_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_item_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_item_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_tensor_overload_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_normal_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logaddexp_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_tensor_overload_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pdist_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_smooth_l1_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_smooth_l1_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal__in_place_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_renorm_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vdot_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_complex_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_polar_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cauchy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumprod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_equal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_equal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_geometric_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_imag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vecdot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logaddexp2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_native_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_channel_shuffle_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_gelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_smooth_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal__in_place_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rad2deg_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_renorm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_renorm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_multiple_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_multiple_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_indices_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_indices_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vdot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vdot_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vdot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vdot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_complex_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cauchy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float8_e5m2, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_geometric_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_geometric_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_igamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_tensor_overload_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_normal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_normal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_group_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hinge_embedding_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal__in_place_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal__in_place_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal_number_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rad2deg_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_renorm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_renorm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_select_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_select_scatter_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_split_with_sizes_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_split_with_sizes_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_split_with_sizes_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vdot_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vdot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_complex_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___getitem___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__unsafe_masked_index_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addbmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_angle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_angle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_any_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_baddbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_byte_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_corrcoef_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cov_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_embed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_permuted_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_permuted_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_equal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_exp2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gradient_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_inner_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_2inputs_2outputs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ldexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_multi_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vecdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logdet_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_and_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mH_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mH_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mT_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_logsumexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nanmean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_silu_complex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softsign_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nonzero_static_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nonzero_static_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_nuc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ormqr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ormqr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_permute_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randint_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rot90_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scalar_tensor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_slice_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_y1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_multiple_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_to_size_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_along_dim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensordot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tile_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_sparse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unbind_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsafe_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsafe_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsqueeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_as_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward__unsafe_masked_index_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_abs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addmv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_asin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_baddbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_bernoulli_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_ceil_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_clamp_min_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_combinations_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cummin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_deg2rad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diagflat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_dist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_double_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_erfinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_expand_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_irfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_rfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fmod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_half_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_inner_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_kron_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_kthvalue_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_lerp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_det_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log10_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logcumsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mT_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_matmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_max_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_msort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nanmean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nanquantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nansum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_celu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_elu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_glu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_normal_number_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_outer_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_pca_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_real_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_reshape_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_round_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sgn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_slice_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_erfcx_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_i1e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_log_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tensordot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_transpose_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_true_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_trunc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unsqueeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_view_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_view_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_zero__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input___rdiv___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input___rpow___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_addbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_addcmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_addr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_alias_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_allclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_angle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_argsort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_argwhere_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_as_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_as_strided_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_asin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_asinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_atanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_atleast_1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_broadcast_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_bucketize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cfloat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cholesky_inverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cholesky_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_clone_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_column_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_constant_pad_nd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_copysign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cov_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cummin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_diagonal_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_diff_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_digamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_double_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_erfinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_exponential_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_eye_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_fftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_fftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_hfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_ifftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_ihfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_flip_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_float_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_floor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_frac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_full_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_gather_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_geometric_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_gradient_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_grid_sampler_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_histc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_hsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_index_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_index_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_item_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_cond_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_vecdot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_log2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_log_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logical_and_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logical_not_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_long_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_matmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_minimum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_msort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_multinomial_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nan_to_num_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_narrow_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_native_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_new_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_mish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nonzero_static_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_norm_fro_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_positive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_pow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_randint_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_repeat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_resize__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_resolve_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_round_decimals_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_rsqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_scatter_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_searchsorted_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sgn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_hann_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_airy_ai_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_bessel_j1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_ndtri_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_squeeze_multiple_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_stft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sum_to_size_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_tan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_to_sparse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_transpose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_trapz_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_true_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_trunc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_unsafe_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_unsafe_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_vsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_vstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_where_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___getitem___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rpow___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_acos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addcdiv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_arange_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argsort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argwhere_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_as_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_asin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atleast_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atleast_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cauchy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_min_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clone_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_conj_physical_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_constant_pad_nd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_dist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_erfc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_expm1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_hfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_hfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ihfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_irfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_flip_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_frexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_gather_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_geqrf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_gradient_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_histc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_hsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_reduce_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_int_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isfinite_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isinf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_kthvalue_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cond_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_vander_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log10_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_and_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_xor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lu_unpack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_maximum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_msort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nan_to_num_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ne_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_celu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_inf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_nuc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_normal_in_place_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ones_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_permute_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_positive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randint_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randint_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_repeat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_decimals_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rsqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signbit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_slice_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_airy_ai_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_erfcx_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_i0e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_i1e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_split_with_sizes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_svd_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_take_along_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tensor_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_transpose_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_triu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_true_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_as_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_zero__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_H_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___radd___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___rdiv___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___rpow___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__softmax_backward_data_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__unsafe_masked_index_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addcmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addmv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_argsort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_asinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atleast_1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atleast_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_baddbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_block_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_bool_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_broadcast_shapes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_broadcast_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_bucketize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_byte_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cartesian_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cdouble_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_clamp_max_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_combinations_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_constant_pad_nd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cov_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diff_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_digamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_dist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_empty_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_equal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_erf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_expand_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_expm1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_hfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ihfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_irfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_irfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_flatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fliplr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_float_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_float_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_floor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_floor_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fmod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_frac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_full_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_reduce_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isnan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_item_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ldexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eigh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_householder_product_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lstsq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_pinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_slogdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log1p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log_normal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logcumsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_and_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mT_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_argmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_max_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_maximum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nansum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_narrow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_native_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_new_ones_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_new_zeros_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_celu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_elu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_norm_inf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_normal_in_place_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ormqr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_outer_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_randint_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_randn_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_remainder_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_resolve_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scalar_tensor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sgn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_hann_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_airy_ai_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_erfcx_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_i0e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_ndtri_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_std_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_std_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_topk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_transpose_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_transpose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_triu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_true_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_trunc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unfold_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unsafe_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unsqueeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_view_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_view_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_where_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_zeros_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay___rdiv___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay___rsub___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay__softmax_backward_data_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_acosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_addbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_alias_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_allclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_any_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_as_strided_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_atanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_atleast_1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_atleast_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_block_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_bmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_bool_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_broadcast_shapes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cartesian_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cholesky_inverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cholesky_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_clamp_max_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_column_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_constant_pad_nd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_count_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cov_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_diagflat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_dstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_einsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_empty_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_eq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_equal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_erf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_expand_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_expm1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_eye_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_hfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_hfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_ihfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_irfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_rfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_flatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_flipud_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_float_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_floor_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_grid_sampler_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_gt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_hsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_hstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_igamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_igammac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_isreal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_kron_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_kthvalue_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_ldexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_householder_product_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_lstsq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_slogdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_vecdot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linspace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_log_normal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logical_and_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_long_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_lu_unpack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_max_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_movedim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_multinomial_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nanmean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nanquantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_narrow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_new_full_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_new_ones_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_celu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_selu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_ormqr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_permute_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_rad2deg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_randint_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_randn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_reciprocal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_repeat_interleave_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_reshape_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_rsqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_scatter_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_select_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_short_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_hann_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_bessel_y1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_erfcx_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_i0e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_zeta_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_split_list_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_squeeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_std_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_svd_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_tan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_tensordot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_tile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_triu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_var_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_vdot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_view_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_view_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_vsplit_cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_T_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_float_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_short_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_abs_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_acosh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_alias_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_as_strided_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atanh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atleast_3d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_broadcast_to_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_clone_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_column_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_conj_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_conj_physical_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cosh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cumsum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diagonal_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_dsplit_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_dstack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_empty_strided_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_expand_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_hfft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_irfft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_item_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_lerp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_diagonal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_and_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_or_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_movedim_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_empty_strided_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_channel_shuffle_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_normal__in_place_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_ones_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_permute_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_real_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_reciprocal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_reshape_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_reshape_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_rsqrt_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sinc_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_special_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_split_with_sizes_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_squeeze_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_t_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tan_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_trace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_transpose_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_transpose_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_true_divide_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unbind_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unsqueeze_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_var_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_vdot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_view_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_zeros_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_add_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_addbmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_addmv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atan_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_1d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_2d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_3d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_bmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_char_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_conj_physical_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cosh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_count_nonzero_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cov_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cumulative_trapezoid_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_diag_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_diff_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_double_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_dsplit_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_dstack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_equal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_exp2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_expand_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_eye_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_fft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_hfftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fliplr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_full_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_index_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_index_put_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_istft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_2inputs_2outputs_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_binary_return_by_ref_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_kron_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_lerp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_diagonal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eig_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eigh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eigvals_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_inv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_ldl_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_rank_hermitian_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_multi_dot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_solve_ex_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_solve_triangular_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linspace_tensor_overload_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_log1p_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_log_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logcumsumexp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logical_or_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logspace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logsumexp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_lu_unpack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mH_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_logsumexp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_select_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_matmul_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ne_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_new_ones_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_circular_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_reflect_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_replicate_negative_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_rms_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_unfold_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_norm_inf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ormqr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_pinverse_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_prod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_randn_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ravel_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_resolve_conj_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_rsub_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sigmoid_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sin_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sinh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_squeeze_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_squeeze_multiple_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_t_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tensor_split_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tile_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_to_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_triangular_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_true_divide_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unbind_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unfold_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_view_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__chunk_cat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_bool_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_long_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_short_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_alias_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_all_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_asinh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atleast_2d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atleast_3d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_chunk_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_constant_pad_nd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_contiguous_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cos_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_dot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_dsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_dstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_empty_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_expand_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_expand_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_eye_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_fft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_irfft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fliplr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_flipud_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_float_power_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_imag_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isnan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_diagonal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linspace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_or_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logsumexp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_narrow_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_full_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_ravel_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_reciprocal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_reshape_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_rsub_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sigmoid_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sinc_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sqrt_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_squeeze_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_take_along_dim_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tanh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_trace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unflatten_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unsqueeze_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_view_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_view_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_acos_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addcdiv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addcmul_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addmm_decomposed_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_angle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_argwhere_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_as_strided_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_as_strided_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_broadcast_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_byte_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cfloat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_chunk_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_conj_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_contiguous_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_corrcoef_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_count_nonzero_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cumsum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cumulative_trapezoid_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_div_no_rounding_mode_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_einsum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_empty_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_hfft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_irfft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_flatten_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_float_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_full_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_fill_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_select_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_int_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isclose_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_2inputs_2outputs_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_kron_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ldexp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eigvalsh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_householder_product_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_ldl_factor_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_ldl_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_factor_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_pinv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_pinv_hermitian_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_tensorsolve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linspace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logical_and_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logspace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_long_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mH_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_cumprod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_prod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_std_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_sum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_var_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_matrix_exp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_narrow_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_zeros_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_channel_shuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv2d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv3d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv_transpose2d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_normalize_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_replicate_negative_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_rms_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_inf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_nuc_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ones_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_pinverse_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_put_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rand_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_randn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_randn_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_real_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rot90_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rsub_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_scalar_tensor_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sin_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sinc_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sinh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_slice_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sparse_sampled_addmm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_list_args_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_with_sizes_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_with_sizes_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_stack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_unbiased_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_stft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sub_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_to_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_to_sparse_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unfold_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unfold_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_bfloat16_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_cdouble_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_complex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_double_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_long_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_abs_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_acos_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_any_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_as_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_as_strided_partial_views_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atleast_1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atleast_2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_broadcast_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_bucketize_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_contiguous_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_copysign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cos_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cosh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cumprod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diag_embed_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_empty_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_exp2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_expand_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_expand_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_expm1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_eye_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_fftshift_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifftshift_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_irfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_irfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_rfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_flip_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_hypot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_igamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_igammac_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_add_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isfinite_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isposinf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_lgamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_or_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logspace_tensor_overload_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_minimum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_native_layer_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_neg_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_full_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_dropout_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_normal__in_place_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_normal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_normal_number_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_positive_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_randn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_remainder_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_round_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sgn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_signbit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_bessel_j0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_i1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_log_ndtr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_multigammaln_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_multigammaln_mvlgamma_p_5_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_ndtri_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_xlog1py_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_split_with_sizes_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_squeeze_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_tanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_true_divide_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unbind_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unflatten_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unfold_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_view_as_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_view_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_vsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_where_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_xlogy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__segment_reduce_lengths_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_acos_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addcmul_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addmm_decomposed_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_amin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_angle_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_arange_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_argmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_argsort_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_as_strided_partial_views_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_asin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atleast_3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_bfloat16_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_broadcast_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cartesian_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_chunk_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_clamp_max_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_column_stack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_combinations_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_copysign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_count_nonzero_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cross_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cumprod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cumulative_trapezoid_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_deg2rad_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diagonal_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_empty_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_exp2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_eye_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_hfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ihfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_irfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_irfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_flipud_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_frac_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_frexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_half_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_hsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isfinite_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_item_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_2inputs_2outputs_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_unary_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_kthvalue_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_le_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_diagonal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eigvals_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_inv_ex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_ldl_factor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_ldl_factor_ex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lu_factor_ex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_rank_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_norm_subgradients_at_zero_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_pinv_hermitian_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_slogdet_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_vecdot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_log_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logaddexp2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logaddexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logdet_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_xor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mT_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_logaddexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_select_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_softmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_std_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_max_pool2d_with_indices_backward_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_max_reduction_no_dim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_min_reduction_no_dim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mode_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_narrow_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_native_batch_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_native_dropout_backward_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_full_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_avg_pool2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_batch_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_channel_shuffle_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv_transpose1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_cosine_embedding_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_dropout_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_elu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_grid_sample_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_linear_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_logsigmoid_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool2d_grad_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_circular_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_constant_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_replicate_negative_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_prelu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_relu6_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nonzero_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_normal_number_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ones_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ormqr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_pca_lowrank_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_quantile_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_randint_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_reshape_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_resize__cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_resolve_conj_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_round_decimals_3_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_rsub_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_amax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_sum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sigmoid_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_bartlett_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_gaussian_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sinc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sinh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_slice_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_slice_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_j0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_j1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_u_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_laguerre_polynomial_l_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_ndtr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_scaled_modified_bessel_k1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_zeta_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_squeeze_multiple_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_std_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_std_unbiased_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_tile_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_topk_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_trapezoid_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unbind_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unbind_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unfold_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unsafe_chunk_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_view_as_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_view_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_vsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_zero__cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_aminmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_argwhere_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atleast_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rand___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rmatmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rmod___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rpow___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast__batch_norm_with_update_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_acos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_alias_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_all_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argsort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argwhere_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_partial_views_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_asinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_left_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_broadcast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bucketize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cauchy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_chalf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cholesky_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_constant_pad_nd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_contiguous_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cummin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagflat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_einsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_eq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_expand_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_expm1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_hfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ihfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_flatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_floor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_geqrf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gradient_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_heaviside_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hypot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_int_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lcm_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_le_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_det_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eigh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_householder_product_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_slogdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logcumsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_xor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mH_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_maximum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_min_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nanquantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ne_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_ones_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softsign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_threshold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nonzero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_quantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ravel_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_remainder_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_repeat_interleave_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resize_as__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_short_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_slice_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_entr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_ndtri_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_xlog1py_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_split_with_sizes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tensordot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trapz_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tril_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_triu_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unbind_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_uniform_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unique_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unsqueeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_as_real_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_right_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cauchy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ceil_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_chalf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_clamp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_column_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_contiguous_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_count_nonzero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rmod___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__batch_norm_with_update_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_abs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_alias_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_asinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atleast_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_block_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_column_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_combinations_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_contiguous_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagflat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_div_trunc_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_einsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_erfinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_irfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_rfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fliplr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_gradient_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_grid_sampler_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_hypot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_slogdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_vector_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mH_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_maximum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_prelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_norm_nuc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_outer_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_remainder_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_repeat_interleave_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_i0e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_svd_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_take_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_to_sparse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_triu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unsafe_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_H_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addcdiv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_angle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_bmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_broadcast_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clamp_min_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_conj_physical_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_deg2rad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagonal_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_div_floor_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_double_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_erfc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ihfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_irfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_hypot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_multi_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_solve_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_vecdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log10_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logaddexp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mT_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_maximum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nanmean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_prelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softplus_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_threshold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_norm_nuc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_permute_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_permute_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_remainder_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_repeat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_rsub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_select_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sinc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_erfcx_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sum_to_size_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_trace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_transpose_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_transpose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tril_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_zero__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cummin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_digamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_double_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_dstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_equal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_exp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_expand_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_exponential_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ihfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ihfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_flatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fliplr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_floor_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_full_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ge_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_geometric_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_gradient_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_grid_sampler_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_half_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_heaviside_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_histc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_hstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_hypot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isneginf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isposinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lstsq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_multi_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_pinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log10_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logical_not_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mH_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_maximum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_min_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_msort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_multinomial_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ne_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_new_empty_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_new_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nextafter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_elu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_prelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_silu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ones_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_pca_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_pow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_quantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_rad2deg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ravel_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_reshape_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_roll_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_round_decimals_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_slice_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_j1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_y0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_xlog1py_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_std_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tensordot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_trace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_transpose_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_var_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_as_real_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_zeros_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___ror___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__chunk_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__unsafe_masked_index_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_acos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_acosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addcmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_aminmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_angle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argsort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argwhere_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_as_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bincount_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_left_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_right_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bool_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bucketize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cdouble_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ceil_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_constant_pad_nd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_count_nonzero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_digamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_double_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_empty_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_erfc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_hfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ihfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ihfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_rfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_rfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fmod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_full_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ge_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_geqrf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gradient_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_grid_sampler_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_hash_tensor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_igammac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isfinite_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_item_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lgamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cond_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_pinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logaddexp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logcumsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_not_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_long_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_msort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nanquantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_embedding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_gelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_nuc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ones_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ormqr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_pca_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polar_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_pow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randint_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randn_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_reciprocal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_repeat_interleave_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resize__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rot90_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_searchsorted_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_i0e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_svd_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_t_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tril_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_triu_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trunc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_uniform_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unravel_index_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unsafe_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_var_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_vdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_real_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_int8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_cuda_int8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_tensor_overload_cuda_complex128, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_complex32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_int8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_bool, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_complex32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_arange_cuda_int8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_full_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_full_cuda_complex128, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_full_cuda_complex32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_full_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_full_cuda_int8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_complex128, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_int8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_tensor_overload_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_tensor_overload_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_tensor_overload_cuda_int8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_tensor_overload_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_bool, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_complex32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_bool, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_complex128, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_uint8, test/test_ops.py::TestTagsCUDA::test_tags_T_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rmod___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___ror___cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags___rsub___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_bfloat16_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_byte_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_complex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_double_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_short_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_add_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_addcdiv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_all_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_any_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_as_strided_partial_views_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atleast_2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atleast_3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_and_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_broadcast_to_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_ceil_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_chunk_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_clamp_max_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_conj_physical_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_div_floor_rounding_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_eq_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_erfinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_expand_as_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ihfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_irfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_rfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fliplr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_flipud_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_heaviside_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_hsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_igamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_igammac_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_index_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isinf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isreal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_istft_cuda_complex64, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_cross_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_log_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_log_normal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logaddexp2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logaddexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logical_not_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_lt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_movedim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_new_empty_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_new_zeros_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_selu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_normal_number_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_positive_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_real_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_reciprocal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_reshape_as_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_round_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_rsqrt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_rsub_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_select_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_log_ndtr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_multigammaln_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_split_with_sizes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sqrt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_square_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_stack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_stft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_t_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_transpose_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_transpose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unsqueeze_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_var_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_view_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_xlogy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_zeros_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addcdiv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addmm_decomposed_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_amax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_argsort_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_argwhere_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_as_strided_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_as_strided_partial_views_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_atleast_1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bfloat16_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bincount_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_xor_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_block_diag_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bool_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bucketize_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cauchy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cdist_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_chunk_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_clamp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_clamp_max_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_copysign_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_corrcoef_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cos_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cross_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diff_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_dsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_dstack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_eq_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_erfc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_expand_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_hfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ihfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_irfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_rfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_rfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_rfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_flatten_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_float_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fmin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_full_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_gather_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ge_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_grid_sampler_2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_i0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_add_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_reduce_amax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_reduce_prod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_int_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isreal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_item_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lcm_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_ldexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lgamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_cross_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_lu_factor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_multi_dot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_tensorinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_vander_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_vecdot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log_normal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log_softmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logaddexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logcumsumexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logdet_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logsumexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mT_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_amax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_argmin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_log_softmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_logaddexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_median_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_normalize_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_softmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_std_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_max_binary_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_min_binary_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_multinomial_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nan_to_num_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nanmean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nanquantile_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_narrow_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ne_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_new_full_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nextafter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_celu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_dropout_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_linear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_normalize_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pdist_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_prelu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_silu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_threshold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_norm_fro_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_norm_nuc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_normal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ones_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ones_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_permute_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_pinverse_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_polar_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_positive_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_prod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_qr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rand_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_resolve_neg_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rot90_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_round_decimals_3_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rsub_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_select_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_hamming_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_slice_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_softmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_bessel_j0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_bessel_y1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_erfcx_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_i0e_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_i1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_log_ndtr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_std_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_svd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_t_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_to_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_trunc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unbind_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unravel_index_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_unsafe_chunk_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unsafe_split_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_vsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_vstack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_zero__cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_zeros_cuda_float32 2025-09-07T09:12:28.7610360Z 2025-09-07T09:12:28.7610513Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-09-07T09:12:28.7610715Z Uploading artifacts took 0.00 seconds 2025-09-07T09:12:28.7610888Z Running test_ops_gradients 2/2 ... [2025-09-07 09:12:28.569231] 2025-09-07T09:12:28.7611064Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:12:28.7611472Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_ops_gradients.py', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 09:12:28.569578] 2025-09-07T09:18:09.1946030Z 2025-09-07T09:18:09.1947712Z test_ops_gradients 2/2 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_gradients_2.2_f8de243285e5e9d1_.log 2025-09-07T09:18:09.2317379Z Running 2712 items in this shard: test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpyMulScalarCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpyNMSCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpySortCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpySplitCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpyTakeCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpyViewCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_T_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___getitem___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___radd___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___rdiv___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___rdiv___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___rpow___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___rsub___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___rsub___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__chunk_cat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__chunk_cat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__segment_reduce_offsets_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__softmax_backward_data_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__unsafe_masked_index_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__unsafe_masked_index_put_accumulate_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__upsample_bilinear2d_aa_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_abs_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_acos_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_acosh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_acosh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addbmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addcdiv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addcdiv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addcmul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addcmul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addmm_decomposed_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addmv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_alias_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_all_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_allclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_allclose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_aminmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_any_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_any_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_arange_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_argwhere_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_as_strided_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_as_strided_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_as_strided_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_as_strided_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_as_strided_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_asin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_asin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_asinh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_atanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_atanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_atleast_1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_atleast_1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_baddbmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_bernoulli_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_block_diag_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_block_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_bmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_broadcast_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_broadcast_to_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_bucketize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_byte_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_byte_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cartesian_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cdist_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cdouble_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cfloat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_chalf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_char_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_char_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cholesky_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cholesky_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cholesky_inverse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cholesky_inverse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cholesky_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_chunk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_clamp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_clamp_max_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_clone_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_clone_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_column_stack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_combinations_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_combinations_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_complex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_conj_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_conj_physical_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_conj_physical_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_constant_pad_nd_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_constant_pad_nd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_copysign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cosh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cov_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cross_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cummax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cumsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cumulative_trapezoid_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cumulative_trapezoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diagflat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diagonal_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diagonal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diagonal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diff_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diff_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_digamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_dist_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_div_no_rounding_mode_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_dot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_dsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_dsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_dstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_dstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_einsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_empty_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_empty_permuted_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_empty_permuted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_empty_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_eq_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_erf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_exp2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_exp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_expand_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_expand_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_expand_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_expm1_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_expm1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_exponential_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_eye_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_eye_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_fft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_fft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_fft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_fft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_hfft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_hfft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_hfftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_hfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifftshift_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifftshift_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ihfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_irfft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_irfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_rfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_rfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_flip_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fliplr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_float_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_floor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_floor_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fmod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_frac_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_frexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_full_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_full_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_full_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_gather_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_geometric_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_geqrf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_grid_sampler_3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_half_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_hash_tensor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_heaviside_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_hsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_hsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_hstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_hypot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_igammac_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_reduce_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_inner_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_int_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_int_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isclose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isfinite_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isfinite_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isnan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isnan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isneginf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isreal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isreal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_istft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_item_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_jiterator_2inputs_2outputs_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_jiterator_2inputs_2outputs_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_jiterator_binary_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_jiterator_unary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_kron_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_kthvalue_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_ldexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_ldexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_le_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_lerp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_lgamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_cholesky_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_cond_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_cond_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_cross_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_det_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_det_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_eig_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_eig_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_eigh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_householder_product_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_inv_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_ldl_factor_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_ldl_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_ldl_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_lstsq_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_lstsq_grad_oriented_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_lstsq_grad_oriented_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_lu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_lu_factor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_lu_factor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_matrix_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_matrix_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_matrix_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_matrix_rank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_matrix_rank_hermitian_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_multi_dot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_norm_subgradients_at_zero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_pinv_hermitian_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_pinv_hermitian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_pinv_singular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_slogdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_slogdet_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_solve_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_solve_triangular_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_solve_triangular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_svd_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_svd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_svdvals_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_tensorinv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_tensorinv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_tensorsolve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_vander_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_vecdot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_vector_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_log10_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_log1p_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_log2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_log_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logaddexp2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logcumsumexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logical_and_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logical_and_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logical_not_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logical_or_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logical_xor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logspace_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logspace_tensor_overload_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logsumexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_lt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_lu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_lu_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_lu_unpack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_argmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_cumprod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_logsumexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_median_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_normalize_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_normalize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_prod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_softmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_var_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_matrix_exp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_max_binary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_meshgrid_list_of_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_meshgrid_variadic_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_min_reduction_with_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_mm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_movedim_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_mv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_mv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nan_to_num_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nanmean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nanmedian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nanquantile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nansum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_narrow_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_narrow_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_native_dropout_backward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_native_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_ne_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_new_empty_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_new_empty_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_new_empty_strided_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_new_ones_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_new_zeros_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nextafter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_alpha_dropout_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_bilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_binary_cross_entropy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_celu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_channel_shuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_conv2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_conv2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_conv3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_conv_transpose2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_conv_transpose3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_cross_entropy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_ctc_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_dropout2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_dropout_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_elu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_embedding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_fractional_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_gaussian_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_glu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_grid_sample_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_hardshrink_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_hardsigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_hardtanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_instance_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_interpolate_area_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_interpolate_nearest_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_interpolate_trilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_leaky_relu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_local_response_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_logsigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_max_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_max_unpool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_max_unpool1d_grad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_max_unpool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_max_unpool2d_grad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_max_unpool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_max_unpool3d_grad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_mish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_multi_head_attention_forward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_normalize_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_normalize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pad_replicate_negative_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pairwise_distance_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pairwise_distance_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pdist_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_prelu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_rms_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_selu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_silu_complex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_silu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_softplus_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_tanhshrink_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_threshold_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_unfold_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_upsample_bilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nonzero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_norm_fro_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_norm_inf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_norm_inf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_norm_nuc_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_normal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_normal_in_place_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_normal_number_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_ones_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_ones_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_ormqr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_ormqr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_pca_lowrank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_pca_lowrank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_permute_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_permute_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_polar_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_positive_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_positive_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_pow_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_prod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_put_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_put_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_rad2deg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_rand_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_rand_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_randint_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_randint_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_randn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_randn_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_real_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_real_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_reciprocal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_reciprocal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_remainder_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_renorm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_repeat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_repeat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_repeat_interleave_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_reshape_as_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_reshape_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_reshape_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_reshape_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_resize_as__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_resolve_conj_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_resolve_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_resolve_neg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_roll_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_roll_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_rot90_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_round_decimals_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_round_decimals_neg_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_rsub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_scalar_tensor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_scatter_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_searchsorted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_select_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_short_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_blackman_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_gaussian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_general_hamming_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_hamming_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_kaiser_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signbit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sinc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sinh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_softmax_with_dtype_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sparse_mm_reduce_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_bessel_j0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_bessel_j1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_bessel_y1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_chebyshev_polynomial_w_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_entr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_hermite_polynomial_h_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_i0e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_i1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_i1e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_log_ndtr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_modified_bessel_i0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_modified_bessel_i1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_modified_bessel_k1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_ndtr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_scaled_modified_bessel_k0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_zeta_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_split_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_split_list_args_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_split_with_sizes_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sqrt_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sqrt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_square_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_squeeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_squeeze_multiple_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_stack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_std_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_std_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_stft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_stft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sum_to_size_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_svd_lowrank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_t_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_take_along_dim_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_tan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_tensor_split_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_tensor_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_tensordot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_tile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_to_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_to_sparse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_to_sparse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_topk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_trace_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_transpose_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_transpose_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_trapz_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_trapz_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_triangular_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_triu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_true_divide_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_true_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unbind_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unbind_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unflatten_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unflatten_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unfold_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_uniform_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_uniform_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unique_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unsafe_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unsafe_chunk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unsqueeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unsqueeze_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_var_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_var_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_var_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_var_mean_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_vdot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_as_complex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_as_real_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_vsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_vstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_xlogy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_zero__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_zeros_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_zeros_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_H_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpyCubeCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpyMulCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpyNMSCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpySortCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpySplitCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpySplitCopyWithIntCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpyViewCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_T_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad___getitem___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad___radd___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad___rmatmul___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad___rmod___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad___rsub___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad__batch_norm_with_update_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad__segment_reduce_lengths_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad__unsafe_masked_index_put_accumulate_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad__upsample_bilinear2d_aa_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_abs_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_abs_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_acos_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_acos_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_acosh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addcdiv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addcmul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addmv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addmv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_alias_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_all_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_all_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_allclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_angle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_any_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_any_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_argmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_as_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_as_strided_partial_views_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_as_strided_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_as_strided_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_asin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_asinh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_asinh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atleast_1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atleast_2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atleast_3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_baddbmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_baddbmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_bfloat16_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_block_diag_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_bmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_bool_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_bool_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_broadcast_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_broadcast_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_broadcast_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_byte_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cartesian_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cdist_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cdouble_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cfloat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_chalf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_char_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_chunk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_clamp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_clamp_max_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_clone_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_column_stack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_column_stack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_combinations_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_combinations_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cond_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_conj_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_conj_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_conj_physical_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_constant_pad_nd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_contiguous_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_contiguous_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_corrcoef_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_corrcoef_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_count_nonzero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_count_nonzero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cov_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cross_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cumprod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_deg2rad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diag_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diag_embed_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diagflat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diagonal_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diagonal_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diagonal_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_dist_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_div_floor_rounding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_div_no_rounding_mode_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_div_no_rounding_mode_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_div_trunc_rounding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_dot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_dot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_dsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_dstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_empty_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_empty_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_empty_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_equal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_erf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_exp2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_exp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_expand_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_expand_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_expm1_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_exponential_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_eye_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_fft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_fft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_fftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_hfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_ifft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_ifftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_ihfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_irfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_irfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_irfftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_irfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_rfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_rfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_rfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_flatten_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_flip_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_flip_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fliplr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fliplr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_flipud_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_float_power_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fmod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_full_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_full_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_full_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_geqrf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_gradient_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_grid_sampler_3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_half_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_half_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_hash_tensor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_histc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_hsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_hsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_hstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_hstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_i0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_put_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_reduce_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_reduce_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_inner_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_invoke_quant_packed_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_invoke_subgraph_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isclose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isfinite_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isinf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isnan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isposinf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isreal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_item_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_item_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_jiterator_2inputs_2outputs_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_jiterator_unary_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_jiterator_unary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_kron_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ldexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_cholesky_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_cross_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_eig_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_eigh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_eigvalsh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_householder_product_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_inv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_inv_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_inv_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_ldl_factor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_ldl_factor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_ldl_factor_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_ldl_factor_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_lstsq_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_lstsq_grad_oriented_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_lu_factor_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_lu_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_matrix_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_matrix_rank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_norm_subgradients_at_zero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_pinv_hermitian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_pinv_singular_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_pinv_singular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_slogdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_svdvals_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_tensorinv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_vander_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_vecdot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_vecdot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_vector_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linspace_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linspace_tensor_overload_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log10_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log1p_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log1p_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log_softmax_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log_softmax_with_dtype_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logical_and_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logical_and_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logical_not_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logical_xor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logspace_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logspace_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logspace_tensor_overload_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_long_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_long_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_lt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_lu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_lu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_lu_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_lu_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_lu_unpack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_lu_unpack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_mH_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_mH_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_map_nested_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_argmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_cumprod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_cumsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_logsumexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_normalize_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_prod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_std_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_std_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_var_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_matmul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_matrix_exp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_matrix_exp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_max_pool2d_with_indices_backward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_meshgrid_variadic_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_meshgrid_variadic_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_min_binary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_minimum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_mm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_mode_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_movedim_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_movedim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_msort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_mul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_multinomial_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_mv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nan_to_num_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nanmean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nanmean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nanmedian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nanquantile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_narrow_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_native_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_native_dropout_backward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ne_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_neg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_new_empty_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_new_ones_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_new_zeros_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_new_zeros_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nextafter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_alpha_dropout_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_avg_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_celu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_channel_shuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_channel_shuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_conv1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_conv2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_conv3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_conv_transpose1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_cross_entropy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_dropout_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_elu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_embedding_bag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_embedding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_fractional_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_gelu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_grid_sample_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_group_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_hardshrink_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_hardswish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_hardtanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_huber_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_kl_div_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_linear_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_logsigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_max_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_max_unpool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_max_unpool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_mse_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_multi_head_attention_forward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_multilabel_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_normalize_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_normalize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pad_circular_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pad_circular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pad_constant_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pad_reflect_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pad_reflect_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pad_replicate_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pad_replicate_negative_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_prelu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_rms_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_rms_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_rrelu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_selu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_silu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_softmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_softplus_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_softshrink_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_softsign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_threshold_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nonzero_static_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_norm_fro_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_norm_inf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_norm_inf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_normal_in_place_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ones_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ones_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ones_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ormqr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_permute_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_permute_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_polygamma_polygamma_n_1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_polygamma_polygamma_n_2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_polygamma_polygamma_n_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_positive_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_pow_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_prod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_put_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_rad2deg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_randn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_randn_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_randn_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ravel_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_real_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_real_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_reciprocal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_remainder_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_renorm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_repeat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_repeat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_repeat_interleave_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_reshape_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_reshape_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_resize__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_resize_as__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_resolve_conj_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_resolve_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_resolve_neg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_roll_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_roll_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_rot90_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_round_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_round_decimals_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_round_decimals_neg_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_rsqrt_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_rsqrt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_rsub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scalar_tensor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scalar_tensor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scan_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scatter_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scatter_reduce_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scatter_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scatter_reduce_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_select_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sgn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_short_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_short_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_bartlett_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_blackman_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_exponential_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_gaussian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_general_hamming_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_hamming_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_hann_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_kaiser_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_nuttall_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signbit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sinc_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sinc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sinh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_slice_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sparse_sampled_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sparse_sampled_addmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_airy_ai_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_chebyshev_polynomial_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_entr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_hermite_polynomial_he_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_i1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_laguerre_polynomial_l_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_legendre_polynomial_p_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_spherical_bessel_j0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_split_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_split_with_sizes_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_split_with_sizes_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sqrt_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_squeeze_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_squeeze_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_squeeze_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_std_mean_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_std_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sub_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sum_to_size_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_t_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_t_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_take_along_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_take_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_tan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_tanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_tensordot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_tile_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_tile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_to_sparse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_to_sparse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_topk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_trace_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_transpose_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_transpose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_transpose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_trapezoid_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_trapezoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_trapz_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_trapz_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_triangular_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_triangular_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_tril_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_triu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_triu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_true_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_trunc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unbind_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unbind_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unfold_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_uniform_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_uniform_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unsafe_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unsafe_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unsqueeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unsqueeze_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unsqueeze_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_mean_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_mean_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_view_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_view_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_vsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_vstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_where_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_while_loop_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_zero__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_zeros_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_zeros_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_zeros_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_zeros_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_H_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_NumpyCatCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_NumpyMulScalarCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_NumpyNMSCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_NumpySortCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_T_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_T_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad___getitem___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad___rmatmul___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad___rmul___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad___rsub___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad___rsub___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad__batch_norm_with_update_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad__native_batch_norm_legit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad__segment_reduce_lengths_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad__segment_reduce_offsets_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad__softmax_backward_data_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad__upsample_bilinear2d_aa_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_abs_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_abs_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_acos_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_addbmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_addcdiv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_addmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_addmv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_addmv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_alias_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_all_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_all_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_allclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_allclose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_aminmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_angle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_angle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_argwhere_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_as_strided_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_as_strided_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_asin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_asinh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_asinh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_atan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_atanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_atleast_1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_atleast_1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_auto_functionalize_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_baddbmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_baddbmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_bfloat16_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_block_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_bmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_bool_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_bool_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_broadcast_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_broadcast_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_broadcast_to_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_bucketize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_byte_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cartesian_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cauchy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cdouble_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_ceil_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cfloat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cfloat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_chalf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_char_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_char_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cholesky_inverse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cholesky_inverse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_clamp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_clamp_max_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_clamp_min_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_clone_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_column_stack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_column_stack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_combinations_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_combinations_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_conj_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_conj_physical_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_constant_pad_nd_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_constant_pad_nd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_contiguous_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cosh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cosh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_count_nonzero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cumsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cumulative_trapezoid_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cumulative_trapezoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_deg2rad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_diag_embed_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_diagflat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_diagonal_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_diagonal_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_diff_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_digamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_dist_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_div_no_rounding_mode_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_dot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_double_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_double_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_dstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_dstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_einsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_permuted_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_permuted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_strided_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_eq_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_eq_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_equal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_erf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_erfinv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_exp2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_exp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_expand_as_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_expand_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_expand_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_expand_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_expand_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_expm1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_exponential_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_eye_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_fft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_fft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_fftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_fftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_fftshift_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_hfft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_hfftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_hfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_ifft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_ifft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_ifftshift_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_ihfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_ihfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_irfft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_irfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_irfftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_irfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_rfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_rfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_rfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_flip_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_flip_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fliplr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fliplr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_flipud_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_float_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_float_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_float_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_floor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_frac_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_full_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_full_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_gather_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_ge_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_geqrf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_half_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_heaviside_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_hstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_igamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_igammac_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_imag_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_index_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_index_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_index_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_index_put_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_index_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_inner_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_inner_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_invoke_subgraph_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isfinite_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isinf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isnan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isnan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isposinf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isreal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_istft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_item_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_item_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_jiterator_2inputs_2outputs_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_jiterator_binary_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_jiterator_binary_return_by_ref_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_jiterator_binary_return_by_ref_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_jiterator_unary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_kthvalue_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_ldexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lerp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lerp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lgamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_cholesky_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_cholesky_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_cholesky_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_cholesky_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_eig_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_eigh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_eigvals_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_eigvals_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_ldl_factor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_ldl_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_lstsq_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_lstsq_grad_oriented_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_lu_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_matrix_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_matrix_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_matrix_rank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_matrix_rank_hermitian_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_multi_dot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_pinv_hermitian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_pinv_singular_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_qr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_slogdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_solve_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_solve_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_solve_triangular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_tensorinv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_tensorsolve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_vecdot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_vector_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linspace_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linspace_tensor_overload_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_log1p_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_log_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_log_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_log_normal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_log_softmax_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logcumsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logical_and_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logical_not_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logical_xor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_long_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lu_unpack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lu_unpack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mH_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mH_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mT_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_map_nested_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_map_triple_nested_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_cumprod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_cumsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_median_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_std_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_std_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_matmul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_matrix_exp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_max_binary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_max_pool2d_with_indices_backward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_maximum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_meshgrid_list_of_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_meshgrid_list_of_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_meshgrid_variadic_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_meshgrid_variadic_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_min_reduction_no_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_min_reduction_with_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_minimum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mode_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_movedim_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_msort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nanmean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nanmean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nanmedian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nansum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_narrow_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_narrow_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_narrow_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_native_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_native_dropout_backward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_native_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_empty_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_empty_strided_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_full_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_full_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_zeros_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_zeros_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nextafter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_channel_shuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_conv1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_conv2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_conv3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_conv_transpose2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_conv_transpose2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_cosine_similarity_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_cross_entropy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_ctc_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_dropout_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_elu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_embedding_bag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_embedding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_gaussian_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_hardswish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_hardtanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_huber_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_instance_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_interpolate_nearest_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_linear_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_max_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_max_unpool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_max_unpool3d_grad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_mish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_multi_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_normalize_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_normalize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_constant_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_constant_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_reflect_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_reflect_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_replicate_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_replicate_negative_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pdist_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_relu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_rms_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_rms_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_rrelu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_selu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_silu_complex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_silu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_soft_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_softplus_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_softshrink_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_softsign_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_softsign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_tanhshrink_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_tanhshrink_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_threshold_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_unfold_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_upsample_nearest_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nonzero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nonzero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nonzero_static_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_norm_fro_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_norm_fro_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_norm_inf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_normal_in_place_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_normal_in_place_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_ones_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_ormqr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_ormqr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_pca_lowrank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_permute_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_permute_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_permute_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_pinverse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_polygamma_polygamma_n_1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_polygamma_polygamma_n_2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_polygamma_polygamma_n_4_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_positive_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_positive_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_put_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_qr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_rad2deg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_rand_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_randn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_randn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_ravel_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_reciprocal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_renorm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_renorm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_repeat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_repeat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_repeat_interleave_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_reshape_as_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_reshape_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_reshape_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_resize__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_resize_as__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_resolve_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_roll_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_round_decimals_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scalar_tensor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scalar_tensor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scan_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scatter_add_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scatter_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scatter_reduce_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scatter_reduce_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_select_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sgn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_signal_windows_bartlett_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_signal_windows_blackman_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_signal_windows_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_signal_windows_kaiser_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_signal_windows_nuttall_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sinc_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sinh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_slice_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sparse_mm_reduce_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sparse_sampled_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sparse_sampled_addmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_airy_ai_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_bessel_j1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_bessel_y0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_bessel_y1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_chebyshev_polynomial_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_chebyshev_polynomial_v_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_hermite_polynomial_h_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_hermite_polynomial_he_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_i1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_i1e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_laguerre_polynomial_l_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_legendre_polynomial_p_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_log_ndtr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_modified_bessel_i0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_modified_bessel_i1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_modified_bessel_k0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_modified_bessel_k1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_ndtri_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_scaled_modified_bessel_k0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_spherical_bessel_j0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_xlog1py_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_split_list_args_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_split_with_sizes_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sqrt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_squeeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_squeeze_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_squeeze_multiple_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_stack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_std_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_std_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_std_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_std_mean_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_stft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sum_to_size_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sum_to_size_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_svd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_svd_lowrank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_t_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_take_along_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tensor_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tensordot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tile_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_topk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_transpose_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_transpose_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_transpose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_trapezoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_trapz_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_triangular_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_triangular_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tril_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tril_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_triu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_true_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unbind_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unbind_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unflatten_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unflatten_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unfold_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_uniform_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unsafe_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unsafe_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unsqueeze_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_var_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_var_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_var_mean_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_var_mean_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_view_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_view_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_view_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_vstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_while_loop_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_xlogy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_zero__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_zero__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_zeros_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_zeros_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_H_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_H_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_NumpyMulCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_NumpySortCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_NumpySplitCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_NumpyViewCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_T_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___getitem___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rdiv___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rdiv___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rmatmul___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rmatmul___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rmod___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rpow___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rpow___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rsub___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad__batch_norm_with_update_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad__segment_reduce_offsets_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad__unsafe_masked_index_put_accumulate_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad__upsample_bilinear2d_aa_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_abs_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_acos_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_acosh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_acosh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addbmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addbmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addcdiv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addcmul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addmv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_alias_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_all_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_allclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_aminmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_angle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_any_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_argwhere_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_as_strided_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_as_strided_partial_views_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_asin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_asin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_asinh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_asinh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_atanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_atanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_atleast_1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_atleast_2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_atleast_3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_atleast_3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_baddbmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_bernoulli_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_bfloat16_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_block_diag_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_bmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_bool_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_bool_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_broadcast_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_bucketize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_byte_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cartesian_prod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cdist_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cfloat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_chalf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_chalf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cholesky_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_clamp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_clamp_max_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_combinations_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_conj_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_conj_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_conj_physical_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_conj_physical_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_copysign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_corrcoef_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_corrcoef_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cos_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cosh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cosh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_count_nonzero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_count_nonzero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cross_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cross_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cummin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cumprod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cumulative_trapezoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diag_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diag_embed_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diagflat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diagonal_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diagonal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diagonal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diff_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diff_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_dist_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_div_floor_rounding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_div_no_rounding_mode_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_div_trunc_rounding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_dot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_double_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_double_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_dsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_dstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_einsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_empty_permuted_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_equal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_erf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_erfc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_exp2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_exp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_expand_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_expand_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_expm1_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_expm1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_eye_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_fft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_fftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_ifft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_ifft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_ifft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_ifft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_ifftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_ifftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_ifftshift_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_ihfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_irfft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_irfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_irfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_irfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_rfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_rfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_flatten_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_flip_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fliplr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_flipud_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_flipud_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_float_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_float_power_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_float_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_floor_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fmod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_frac_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_full_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_gather_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_gather_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ge_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_geometric_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_geqrf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_gradient_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_grid_sampler_2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_gt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_half_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_hsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_hsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_hstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_add_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_reduce_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_int_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_int_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isclose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isfinite_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isfinite_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isinf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isinf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isneginf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isposinf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isreal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isreal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_istft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_item_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_item_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_jiterator_2inputs_2outputs_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_jiterator_binary_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_jiterator_binary_return_by_ref_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_jiterator_unary_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_kron_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ldexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ldexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_le_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_lerp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_cholesky_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_cond_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_eigh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_eigvals_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_eigvals_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_eigvalsh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_eigvalsh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_householder_product_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_inv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_inv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_inv_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_ldl_factor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_ldl_factor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_ldl_factor_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_ldl_factor_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_ldl_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_lu_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_lu_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_matrix_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_matrix_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_matrix_rank_hermitian_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_multi_dot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_multi_dot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_slogdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_slogdet_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_solve_triangular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_svd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_svdvals_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_tensorinv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_tensorsolve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_tensorsolve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_vander_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_vector_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linspace_tensor_overload_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log10_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log_normal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log_softmax_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logcumsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logdet_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logical_not_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logical_not_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logical_xor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logspace_tensor_overload_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logsumexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_lu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_lu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_lu_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_lu_unpack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_lu_unpack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mH_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mT_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_cumsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_logsumexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_normalize_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_normalize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_var_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_var_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_matrix_exp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_matrix_exp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_max_binary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_max_reduction_no_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_maximum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_meshgrid_list_of_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_meshgrid_list_of_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_min_reduction_with_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_minimum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mode_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_movedim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_msort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nan_to_num_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nanmean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nanmean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nanquantile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_narrow_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_narrow_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_narrow_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_native_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_native_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ne_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ne_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_new_empty_strided_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_new_full_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_new_full_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_new_ones_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_new_ones_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_new_zeros_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nextafter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_avg_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_celu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_channel_shuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_channel_shuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_conv2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_conv3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_conv3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_conv_transpose1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_conv_transpose3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_cosine_embedding_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_cross_entropy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_ctc_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_dropout2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_elu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_embedding_bag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_embedding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_grid_sample_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_hardswish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_hardtanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_huber_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_instance_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_interpolate_area_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_interpolate_linear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_leaky_relu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_linear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_local_response_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_max_unpool1d_grad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_max_unpool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_mse_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_multi_head_attention_forward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_normalize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pad_constant_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pad_replicate_negative_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pad_replicate_negative_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pairwise_distance_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pdist_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_prelu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_relu6_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_rms_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_rrelu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_silu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_soft_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_softsign_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_softsign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_tanhshrink_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_unfold_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_upsample_bilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_upsample_nearest_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nonzero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nonzero_static_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_norm_inf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ones_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ones_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ones_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ormqr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_outer_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_outer_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_pca_lowrank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_pca_lowrank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_permute_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_permute_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_pinverse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_pinverse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_polar_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_polygamma_polygamma_n_1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_polygamma_polygamma_n_4_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_positive_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_positive_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_pow_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_put_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_rad2deg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_rand_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_rand_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_randn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_randn_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ravel_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ravel_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_reciprocal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_renorm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_renorm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_repeat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_repeat_interleave_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_reshape_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_reshape_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_resize__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_resize_as__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_resize_as__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_resolve_conj_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_resolve_conj_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_resolve_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_roll_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_roll_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_rot90_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_rot90_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_rsqrt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_rsub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_rsub_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_scalar_tensor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_scalar_tensor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_scatter_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_scatter_reduce_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sgn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_short_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signal_windows_bartlett_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signal_windows_blackman_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signal_windows_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signal_windows_gaussian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signal_windows_hamming_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signal_windows_hann_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signbit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sinc_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sinc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_slice_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_slice_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_slice_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_softmax_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sparse_mm_reduce_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sparse_sampled_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sparse_sampled_addmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_bessel_j0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_bessel_j1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_bessel_y0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_chebyshev_polynomial_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_chebyshev_polynomial_u_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_chebyshev_polynomial_v_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_entr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_i0e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_i1e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_laguerre_polynomial_l_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_log_ndtr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_modified_bessel_k0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_modified_bessel_k1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_scaled_modified_bessel_k1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_split_list_args_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_split_with_sizes_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_split_with_sizes_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sqrt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_square_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_squeeze_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_squeeze_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_std_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_std_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_std_mean_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_std_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_stft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_stft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_svd_lowrank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_t_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_t_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_take_along_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_take_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_tan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_tanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_tensor_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_tensordot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_topk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_trace_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_transpose_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_transpose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_transpose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_trapz_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_triangular_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_tril_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_triu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_triu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_true_divide_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_true_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_trunc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unfold_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unfold_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unique_consecutive_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unique_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unsafe_split_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unsafe_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unsqueeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unsqueeze_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unsqueeze_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_var_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_var_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_var_mean_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_var_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_var_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_vdot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_view_as_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_view_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_view_as_real_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_view_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_vsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_vstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_vstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_where_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_where_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_zero__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_zeros_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_zeros_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_zeros_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_H_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_H_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_T_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___getitem___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___radd___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rdiv___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rmatmul___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rmod___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rpow___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rpow___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rsub___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rsub___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad__batch_norm_with_update_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad__chunk_cat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad__segment_reduce_lengths_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_abs_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_abs_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_acos_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_acos_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_add_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addbmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addcmul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addmm_decomposed_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_alias_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_all_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_all_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_allclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_aminmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_angle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_angle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_any_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_any_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_argmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_argsort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_as_strided_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_as_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_as_strided_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_as_strided_partial_views_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_as_strided_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_asin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_asinh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atleast_1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atleast_2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atleast_2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atleast_3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atleast_3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_baddbmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_bfloat16_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_block_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_bmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_bmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_broadcast_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_broadcast_to_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_bucketize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_byte_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_byte_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cartesian_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ceil_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_chalf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_char_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cholesky_inverse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cholesky_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_chunk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_clamp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_combinations_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_conj_physical_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_constant_pad_nd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_contiguous_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cos_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cov_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cross_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cumprod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cumsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cumulative_trapezoid_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_deg2rad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diag_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diagflat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diagflat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diagonal_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diagonal_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diagonal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diagonal_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diff_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_digamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_dist_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_dot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_dot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_double_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_dstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_einsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_empty_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_empty_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_empty_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_empty_permuted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_empty_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_eq_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_eq_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_equal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_equal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_erfc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_exp2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_exp2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_expand_as_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_expand_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_expm1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_fft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_fft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_fftshift_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_hfft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_hfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_ifft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_ifft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_ifftshift_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_ifftshift_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_ihfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_irfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_irfft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_irfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_irfftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_irfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_rfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_rfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_rfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_flatten_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_flatten_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_flip_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_float_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_float_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_float_power_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_floor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_floor_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_frac_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_full_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_full_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_full_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_gather_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ge_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_geqrf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_gradient_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_grid_sampler_3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_half_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_heaviside_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_hsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_hstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_hypot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_igamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_put_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_put_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_reduce_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_int_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_isclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_isfinite_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_isin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_isinf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_isinf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_isposinf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_isreal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_isreal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_istft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_jiterator_2inputs_2outputs_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_jiterator_binary_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_jiterator_binary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_jiterator_unary_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_kthvalue_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_lerp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_lgamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_cholesky_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_cross_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_det_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_det_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_diagonal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_eig_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_eig_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_eigh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_eigh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_eigvals_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_inv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_inv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_inv_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_inv_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_ldl_factor_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_ldl_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_ldl_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_lstsq_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_lstsq_grad_oriented_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_lu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_lu_factor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_lu_factor_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_matrix_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_matrix_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_matrix_power_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_matrix_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_matrix_rank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_multi_dot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_norm_subgradients_at_zero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_pinv_hermitian_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_pinv_hermitian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_pinv_singular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_slogdet_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_solve_triangular_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_solve_triangular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_svd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_svdvals_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_tensorinv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_tensorinv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_tensorsolve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_vander_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linspace_tensor_overload_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_log1p_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_log2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_log_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logaddexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logcumsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logcumsumexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logdet_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logical_not_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logical_not_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logical_or_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logical_or_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logical_xor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logspace_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_lt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_lu_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_lu_unpack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mH_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mH_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mT_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_argmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_cumprod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_cumsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_cumsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_logaddexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_logsumexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_median_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_normalize_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_normalize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_softmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_std_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_sum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_var_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_matmul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_matmul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_matrix_exp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_max_pool2d_with_indices_backward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_max_reduction_no_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_min_reduction_no_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_minimum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mode_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_movedim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_msort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nan_to_num_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nanmean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nanmean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nanmedian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nansum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_native_dropout_backward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_native_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ne_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ne_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_new_empty_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_new_empty_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_new_ones_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_new_ones_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_new_zeros_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nextafter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_alpha_dropout_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_avg_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_bilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_channel_shuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv_transpose2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv_transpose3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_cosine_embedding_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_cross_entropy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_ctc_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_dropout3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_elu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_embedding_bag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_gaussian_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_glu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_hardsigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_hardtanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_instance_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_interpolate_area_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_interpolate_linear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_l1_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_l1_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_local_response_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_logsigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_max_unpool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_mse_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_multi_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_multilabel_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_pad_circular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_pad_reflect_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_pad_reflect_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_rms_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_rms_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_rrelu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_silu_complex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_softmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_softsign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_tanhshrink_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_threshold_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_unfold_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_upsample_nearest_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nonzero_static_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_norm_fro_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_norm_inf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_norm_inf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_normal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_normal_in_place_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_normal_in_place_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_normal_number_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ones_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ones_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ones_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_outer_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_pca_lowrank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_pca_lowrank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_permute_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_permute_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_permute_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_pinverse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_polygamma_polygamma_n_1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_put_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_rand_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_randint_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_randint_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_randn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_randn_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ravel_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_reciprocal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_remainder_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_repeat_interleave_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_reshape_as_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_reshape_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_resize__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_resize__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_resize_as__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_resolve_conj_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_resolve_conj_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_resolve_neg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_roll_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_rot90_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_round_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_round_decimals_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_rsqrt_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_rsqrt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_rsub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scalar_tensor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scatter_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scatter_reduce_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scatter_reduce_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sgn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_short_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_bartlett_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_blackman_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_gaussian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_general_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_hann_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_kaiser_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_nuttall_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signbit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sinc_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sinc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sinh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_slice_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_slice_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_softmax_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sparse_mm_reduce_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sparse_sampled_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sparse_sampled_addmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_airy_ai_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_bessel_j0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_bessel_j1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_bessel_y1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_chebyshev_polynomial_v_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_erfcx_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_hermite_polynomial_h_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_i0e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_i1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_laguerre_polynomial_l_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_legendre_polynomial_p_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_modified_bessel_k0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_scaled_modified_bessel_k1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_spherical_bessel_j0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_split_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_split_list_args_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_split_with_sizes_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_split_with_sizes_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_split_with_sizes_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sqrt_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sqrt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_square_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_squeeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_squeeze_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_squeeze_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_squeeze_multiple_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_std_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_std_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_stft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sum_to_size_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sum_to_size_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_svd_lowrank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_svd_lowrank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_t_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_t_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_t_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_take_along_dim_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_take_along_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tensordot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_to_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_to_sparse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_to_sparse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_trace_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_trace_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_transpose_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_trapezoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_trapz_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_trapz_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_triangular_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tril_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_triu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_true_divide_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_true_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unbind_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unbind_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unflatten_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unfold_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unfold_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_uniform_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_uniform_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unique_consecutive_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsafe_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsafe_chunk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsafe_split_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsafe_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsqueeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsqueeze_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsqueeze_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_var_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_var_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_var_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_vdot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_vdot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_view_as_complex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_view_as_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_view_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_view_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_view_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_view_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_vsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_zero__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_zeros_cuda_complex128 2025-09-07T09:18:09.2661769Z 2025-09-07T09:18:09.2661872Z Running test_quantization 4/5 ... [2025-09-07 09:18:09.202053] 2025-09-07T09:18:09.2662041Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:18:09.2671829Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_quantization.py', '--shard-id=4', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 09:18:09.202264] 2025-09-07T09:30:35.3815691Z 2025-09-07T09:30:35.3817769Z test_quantization 4/5 was successful, full logs can be found in artifacts with path test/test-reports/test_quantization_4.5_26e8d0c9eac8bd04_.log 2025-09-07T09:30:35.3864437Z Running 263 items in this shard: test/test_quantization.py::TestQuantizedOps::test_cat, test/test_quantization.py::TestQuantizedOps::test_custom_module_lstm, test/test_quantization.py::TestQuantizedOps::test_hardtanh, test/test_quantization.py::TestQuantizedOps::test_int8_mul_onednn, test/test_quantization.py::TestQuantizedOps::test_max_pool1d, test/test_quantization.py::TestQuantizedOps::test_max_pool2d_cudnn, test/test_quantization.py::TestQuantizedOps::test_mean, test/test_quantization.py::TestQuantizedOps::test_qclamp, test/test_quantization.py::TestQuantizedOps::test_qhardsigmoid, test/test_quantization.py::TestQuantizedOps::test_qtopk, test/test_quantization.py::TestQuantizedOps::test_sigmoid, test/test_quantization.py::TestQNNPackOps::test_adaptive_avg_pool2d, test/test_quantization.py::TestQNNPackOps::test_qnnpack_mul, test/test_quantization.py::TestQuantizedLinear::test_qlinear_gelu_fp8, test/test_quantization.py::TestQuantizedLinear::test_qlinear_relu, test/test_quantization.py::TestQuantizedLinear::test_qlinear_with_input_q_dq_qweight_dq_output_fp32, test/test_quantization.py::TestQuantizedLinear::test_wrapped_quantized_linear_prepacked, test/test_quantization.py::TestQuantizedConv::test_qconv1d_relu_pt2e, test/test_quantization.py::TestQuantizedConv::test_qconv2d, test/test_quantization.py::TestQuantizedConv::test_qconv2d_fp8, test/test_quantization.py::TestQuantizedConv::test_qconv2d_pt2e, test/test_quantization.py::TestQuantizedConv::test_qconv2d_relu, test/test_quantization.py::TestQuantizedConv::test_qconv2d_sum_relu_float_output_pt2e, test/test_quantization.py::TestQuantizedConv::test_qconv2d_swish_pt2e, test/test_quantization.py::TestQuantizedConv::test_qconv3d_relu, test/test_quantization.py::TestQuantizedConv::test_qconv_transpose3d, test/test_quantization.py::TestDynamicQuantizedOps::test_dynamic_conv3d, test/test_quantization.py::TestDynamicQuantizedOps::test_linear_dynamic_fp16_onednn, test/test_quantization.py::TestDynamicQuantizedOps::test_qlstmGRU, test/test_quantization.py::TestPadding::test_reflection_pad2d, test/test_quantization.py::TestQuantizedEmbeddingOps::test_embedding_bag_byte, test/test_quantization.py::TestFakeQuantizeOps::test_fake_quant_per_channel_qparam_range, test/test_quantization.py::TestFakeQuantizeOps::test_fake_quant_preserves_qparam_shapes_for_activations, test/test_quantization.py::TestFakeQuantizeOps::test_fixed_qparams_fq_module, test/test_quantization.py::TestFakeQuantizeOps::test_learnable_backward_per_channel_cpu, test/test_quantization.py::TestFakeQuantizeOps::test_learnable_backward_per_tensor_cuda, test/test_quantization.py::TestFakeQuantizeOps::test_learnable_forward_per_channel_cpu, test/test_quantization.py::TestFakeQuantizeOps::test_learnable_forward_per_tensor_cuda, test/test_quantization.py::TestFusedObsFakeQuant::test_fused_obs_fake_quant_moving_avg, test/test_quantization.py::TestFusedObsFakeQuant::test_fused_obs_fake_quant_moving_avg_per_channel, test/test_quantization.py::TestQuantizedTensor::test_bfp16_quantize, test/test_quantization.py::TestQuantizedTensor::test_choose_qparams, test/test_quantization.py::TestQuantizedTensor::test_choose_qparams_optimized, test/test_quantization.py::TestQuantizedTensor::test_compare_per_tensor_device_numerics, test/test_quantization.py::TestQuantizedTensor::test_decomposed_dynamic_quant_pattern, test/test_quantization.py::TestQuantizedTensor::test_decomposed_quantize_per_channel_bfloat16_input, test/test_quantization.py::TestQuantizedTensor::test_decomposed_quantize_per_channel_group, test/test_quantization.py::TestQuantizedTensor::test_decomposed_quantize_per_token, test/test_quantization.py::TestQuantizedTensor::test_qtensor_cpu, test/test_quantization.py::TestQuantizedTensor::test_qtensor_equal, test/test_quantization.py::TestQuantizedTensor::test_qtensor_fill_per_channel, test/test_quantization.py::TestQuantizedTensor::test_qtensor_fill_per_channel_nhwc, test/test_quantization.py::TestQuantizedTensor::test_qtensor_float_assignment, test/test_quantization.py::TestQuantizedTensor::test_qtensor_masked_fill_cpu, test/test_quantization.py::TestQuantizedTensor::test_qtensor_per_channel_load_save, test/test_quantization.py::TestQuantizedTensor::test_qtensor_quantize_per_channel, test/test_quantization.py::TestQuantizedTensor::test_qtensor_reshape, test/test_quantization.py::TestQuantizedTensor::test_qtensor_unsqueeze, test/test_quantization.py::TestQuantizedTensor::test_quantize_per_channel_sub_byte, test/test_quantization.py::TestFakeQuantize::test_quant_min_max_override, test/test_quantization.py::TestObserver::test_dynamic_quant_observer_matching_choose_qparams, test/test_quantization.py::TestObserver::test_histogram_observer_consistent_buffer_shape, test/test_quantization.py::TestObserver::test_per_tensor_observers, test/test_quantization.py::TestObserver::test_save_load_state_dict_script, test/test_quantization.py::TestStaticQuantizedModule::test_dropout, test/test_quantization.py::TestStaticQuantizedModule::test_embedding_bag_api, test/test_quantization.py::TestStaticQuantizedModule::test_prelu, test/test_quantization.py::TestDynamicQuantizedModule::test_cell_api, test/test_quantization.py::TestDynamicQuantizedModule::test_dynamic_convtranspose3d, test/test_quantization.py::TestReferenceQuantizedModule::test_rnn, test/test_quantization.py::TestDistributed::test_device_affinity, test/test_quantization.py::TestDistributed::test_observers_preserve_buffers, test/test_quantization.py::TestFusedObsFakeQuantModule::test_fused_obs_fq_moving_avg_module, test/test_quantization.py::TestBackendConfig::test_backend_op_config_from_dict, test/test_quantization.py::TestBackendConfig::test_backend_op_config_set_input_type_to_index, test/test_quantization.py::TestBackendConfig::test_backend_op_config_set_root_node_getter, test/test_quantization.py::TestQuantizeEagerPTQStatic::test_forward_hooks_preserved, test/test_quantization.py::TestQuantizeEagerPTQStatic::test_nested2, test/test_quantization.py::TestQuantizeEagerPTQStatic::test_nested3, test/test_quantization.py::TestQuantizeEagerPTQStatic::test_resnet_base, test/test_quantization.py::TestQuantizeEagerPTQStatic::test_skip_quant, test/test_quantization.py::TestQuantizeEagerPTQDynamic::test_linear_relu_fusion, test/test_quantization.py::TestQuantizeEagerPTQDynamic::test_nested1, test/test_quantization.py::TestQuantizeEagerPTQDynamic::test_per_channel_linear_quantize, test/test_quantization.py::TestQuantizeEagerPTQDynamic::test_quantized_rnn, test/test_quantization.py::TestQuantizeEagerPTQDynamic::test_type_match_rule, test/test_quantization.py::TestQuantizeEagerOps::test_conv_3d, test/test_quantization.py::TestQuantizeEagerQAT::test_conv_linear_symm, test/test_quantization.py::TestQuantizeEagerQAT::test_embedding_qat_qconfig_equal, test/test_quantization.py::TestFuseEager::test_fuse_module_train, test/test_quantization.py::TestFuseEager::test_fusion_linear_bn_eval, test/test_quantization.py::TestFuseEager::test_fusion_sequential_model_eval, test/test_quantization.py::TestFuseEager::test_fusion_sequential_model_train, test/test_quantization.py::TestModelNumericsEager::test_float_quant_compare_per_channel, test/test_quantization.py::TestModelNumericsEager::test_float_quant_compare_per_tensor, test/test_quantization.py::TestNumericSuiteEager::test_compare_model_outputs_conv_static, test/test_quantization.py::TestNumericSuiteEager::test_compare_model_stub_linear_static, test/test_quantization.py::TestNumericSuiteEager::test_output_logger, test/test_quantization.py::TestEqualizeEager::test_equalize_fused_linearrelu, test/test_quantization.py::TestBiasCorrectionEager::test_linear_chain, test/test_quantization.py::TestFuseFx::test_fuse_conv_bn_add_relu_by_default, test/test_quantization.py::TestQuantizeFx::test_convert_custom_config_to_dict, test/test_quantization.py::TestQuantizeFx::test_custom_module_class_input_has_duplicate_nodes, test/test_quantization.py::TestQuantizeFx::test_default_qconfig_mapping_override_global, test/test_quantization.py::TestQuantizeFx::test_dequantize, test/test_quantization.py::TestQuantizeFx::test_dynamic_quant_fp16, test/test_quantization.py::TestQuantizeFx::test_dynamic_with_fusion_multiple_uses, test/test_quantization.py::TestQuantizeFx::test_fp32_sum, test/test_quantization.py::TestQuantizeFx::test_fuse_custom_config_from_dict, test/test_quantization.py::TestQuantizeFx::test_fuse_custom_config_to_dict, test/test_quantization.py::TestQuantizeFx::test_linear_shape_view, test/test_quantization.py::TestQuantizeFx::test_linear_size_view, test/test_quantization.py::TestQuantizeFx::test_lowering_functional_conv_transpose_with_kwargs, test/test_quantization.py::TestQuantizeFx::test_match_pattern_with_multiple_args, test/test_quantization.py::TestQuantizeFx::test_output_lists_and_dicts, test/test_quantization.py::TestQuantizeFx::test_pattern_match_constant, test/test_quantization.py::TestQuantizeFx::test_prepare_custom_config_set_input_quantized_indexes, test/test_quantization.py::TestQuantizeFx::test_prepare_custom_config_set_standalone_module_name, test/test_quantization.py::TestQuantizeFx::test_preserve_attributes, test/test_quantization.py::TestQuantizeFx::test_preserve_tuple, test/test_quantization.py::TestQuantizeFx::test_propagate_dtypes_for_known_nodes_dict_split_tuple_args, test/test_quantization.py::TestQuantizeFx::test_propagate_dtypes_for_known_nodes_list_args, test/test_quantization.py::TestQuantizeFx::test_qconfig_mapping_repr, test/test_quantization.py::TestQuantizeFx::test_qconfig_mapping_set_module_name_object_type_order, test/test_quantization.py::TestQuantizeFx::test_qparams_fqn, test/test_quantization.py::TestQuantizeFx::test_relu_lowering, test/test_quantization.py::TestQuantizeFx::test_return_none, test/test_quantization.py::TestQuantizeFx::test_sequential, test/test_quantization.py::TestQuantizeFx::test_shape_followed_by_quantized_op, test/test_quantization.py::TestQuantizeFx::test_standalone_module_quantized_interface, test/test_quantization.py::TestQuantizeFx::test_static_lstm_consume_tuple, test/test_quantization.py::TestQuantizeFxOps::test_cat, test/test_quantization.py::TestQuantizeFxOps::test_conv_transpose_1d, test/test_quantization.py::TestQuantizeFxOps::test_fixed_qparams_ops_qint8, test/test_quantization.py::TestQuantizeFxOps::test_functional_linear, test/test_quantization.py::TestQuantizeFxOps::test_int8_input_no_unnecessary_fq, test/test_quantization.py::TestQuantizeFxOps::test_linear_module, test/test_quantization.py::TestQuantizeFxOps::test_mul, test/test_quantization.py::TestQuantizeFxOps::test_qbatch_norm, test/test_quantization.py::TestQuantizeFxOps::test_qmatmul, test/test_quantization.py::TestQuantizeFxOps::test_reshape_fp16, test/test_quantization.py::TestQuantizeFxOps::test_silu_reference, test/test_quantization.py::TestQuantizeFxOps::test_softmax_reference, test/test_quantization.py::TestSubgraphRewriter::test_subgraph_rewriter_correct_output_replacement, test/test_quantization.py::TestSubgraphRewriter::test_subgraph_rewriter_graph_argument_order, test/test_quantization.py::TestSubgraphRewriter::test_subgraph_rewriter_internal_pattern_nodes_cannot_have_users_that_are_not_matched, test/test_quantization.py::TestDuplicateDQPass::test_avgpool_use_different_qconfig, test/test_quantization.py::TestMetaDataPorting::test_metadata_porting_for_dq_no_static_q, test/test_quantization.py::TestMetaDataPorting::test_no_metadata_porting_through_unknown_ops, test/test_quantization.py::TestNumericDebugger::test_re_export_preserve_handle, test/test_quantization.py::TestQuantizePT2E::test_allow_exported_model_train_eval, test/test_quantization.py::TestQuantizePT2E::test_constant_prop_preserve_metadata, test/test_quantization.py::TestQuantizePT2E::test_derived_qspec, test/test_quantization.py::TestQuantizePT2E::test_derived_qspec_per_channel, test/test_quantization.py::TestQuantizePT2E::test_fold_all_ops_before_quantize, test/test_quantization.py::TestQuantizePT2E::test_fold_quantize, test/test_quantization.py::TestQuantizePT2E::test_input_edge_sanity_check, test/test_quantization.py::TestQuantizePT2E::test_move_exported_model_bn_device_cpu, test/test_quantization.py::TestQuantizePT2E::test_multi_users_without_output_observer, test/test_quantization.py::TestQuantizePT2E::test_preserve_nn_module_stack, test/test_quantization.py::TestQuantizePT2E::test_quantization_dtype_bfloat16_float8_e5m2, test/test_quantization.py::TestQuantizePT2E::test_quantization_dtype_float32_float8_e5m2, test/test_quantization.py::TestQuantizePT2E::test_quantization_dtype_float32_int16, test/test_quantization.py::TestQuantizePT2E::test_speed, test/test_quantization.py::TestPT2ERepresentation::test_add, test/test_quantization.py::TestPT2ERepresentation::test_conv2d, test/test_quantization.py::TestXNNPACKQuantizer::test_cat_same_node, test/test_quantization.py::TestXNNPACKQuantizer::test_set_module_type_case_2, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_adaptive_avg_pool2d_recipe, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_conv2d_binary2, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_conv2d_unary, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_filter_linear_recipe, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_linear_binary_unary, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_linear_unary, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_qat_dynamic_quant_linear, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_set_module_name_qconfig, test/test_quantization.py::TestQuantizePT2EX86Inductor::test_set_module_name_qconfig_for_dynamic_quant, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn1d::test_fold_bn_erases_add_node, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn1d::test_qat_conv_bn_per_channel_weight_bias, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn1d::test_qat_preserve_source_fn_stack, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn1d::test_qat_update_shared_qspec, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn2d::test_fold_bn_erases_add_node, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn2d::test_qat_conv_bn_per_channel_weight_bias, test/test_quantization.py::TestQuantizePT2EQAT_ConvBn2d::test_qat_conv_transpose_bn, test/test_quantization.py::TestQuantizePT2EQATModels::test_qat_resnet18, test/test_quantization.py::TestFXGraphMatcher::test_op_relationship_mapping, test/test_quantization.py::TestFXGraphMatcher::test_simple_fusion, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_add_shadow_loggers_mod_ptq, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_extend_logger_results_with_comparison, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_extract_weights_fqn, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_extract_weights_linear_fun_ptq, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_extract_weights_linear_fun_qat, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_int8_shadows_fp32_coverage, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_int8_shadows_fp32_simple, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_linear_fp16_activations, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_linear_kwargs_shadow, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_match_activations_mod_ptq, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_match_activations_mod_qat, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_op_io_dtype_coverage, test/test_quantization.py::TestFXNumericSuiteCoreAPIs::test_shadow_activations_fqn, test/test_quantization.py::TestFXNumericSuiteNShadows::test_custom_functions_and_tracer, test/test_quantization.py::TestFXNumericSuiteNShadows::test_qconfig_multi_mapping_repr, test/test_quantization.py::TestFXNumericSuiteNShadows::test_qconfig_multi_mapping_retroactive_padding, test/test_quantization.py::TestFXNumericSuiteCoreAPIsModels::test_resnet18, test/test_quantization.py::TestFxModelReportDetector::test_fusion_layer_in_sequential, test/test_quantization.py::TestFxModelReportDetector::test_qat_aware_model_example, test/test_quantization.py::TestFxModelReportObserver::test_single_batch_of_ones, test/test_quantization.py::TestFxModelReportObserver::test_zero_tensor_errors, test/test_quantization.py::TestFxModelReportClass::test_constructor, test/test_quantization.py::TestFxDetectInputWeightEqualization::test_input_weight_equalization_determine_points, test/test_quantization.py::TestFxDetectInputWeightEqualization::test_input_weight_equalization_report_gen, test/test_quantization.py::TestFxDetectOutliers::test_outlier_detection_determine_points, test/test_quantization.py::TestFxModelReportVisualizer::test_generate_tables_no_match, test/test_quantization.py::TestFxModelReportVisualizer::test_generate_tables_single_feat_match, test/test_quantization.py::TestEqualizeFx::test_input_weight_eq_observer, test/test_quantization.py::TestEqualizeFx::test_input_weight_equalization_activation_values, test/test_quantization.py::TestEqualizeFx::test_input_weight_equalization_weights_bias, test/test_quantization.py::TestSerialization::test_conv2d_nobias_graph, test/test_quantization.py::TestSerialization::test_conv2d_nobias_graph_v3, test/test_quantization.py::TestSerialization::test_conv3d_relu, test/test_quantization.py::TestSerialization::test_default_qat_qconfig, test/test_quantization.py::TestSerialization::test_linear, test/test_quantization.py::TestQuantizeJit::test_conv_bn, test/test_quantization.py::TestQuantizeJitPasses::test_finalize_for_linear, test/test_quantization.py::TestQuantizeJitPasses::test_foldbn_trivial, test/test_quantization.py::TestQuantizeJitPasses::test_insert_observers_for_if_consistent_observation, test/test_quantization.py::TestQuantizeJitPasses::test_insert_quant_dequant, test/test_quantization.py::TestQuantizeJitPasses::test_replicate_dequant_same_value, test/test_quantization.py::TestQuantizeJitPasses::test_replicate_dequantize_in_block, test/test_quantization.py::TestQuantizeJitPasses::test_replicate_quantize_for_if, test/test_quantization.py::TestQuantizeJitPasses::test_swap_functional_linear, test/test_quantization.py::TestQuantizeJitOps::test_dequantize_tuple, test/test_quantization.py::TestQuantizeJitOps::test_linear, test/test_quantization.py::TestQuantizeJitOps::test_qbatch_norm_relu_BNRelu, test/test_quantization.py::TestQuantizeJitOps::test_quantized_add_relu_alpha, test/test_quantization.py::TestQuantizeJitOps::test_quantized_cat, test/test_quantization.py::TestQuantizeJitOps::test_quantized_mul, test/test_quantization.py::TestQuantizeJitOps::test_quantized_mul_scalar, test/test_quantization.py::TestQuantizeDynamicJitPasses::test_convert_dynamic_fp16, test/test_quantization.py::TestQuantizeDynamicJitPasses::test_dynamic_multi_op, test/test_quantization.py::TestQuantizeDynamicJitPasses::test_prepare_dynamic_child_qconfig, test/test_quantization.py::TestDeprecatedJitQuantized::test_rnn_quantized, test/test_quantization.py::TestAOMigrationQuantization::test_function_import_fake_quantize, test/test_quantization.py::TestAOMigrationQuantization::test_function_import_fuser_method_mappings, test/test_quantization.py::TestAOMigrationQuantization::test_function_import_qconfig, test/test_quantization.py::TestAOMigrationQuantization::test_function_import_quant_type, test/test_quantization.py::TestAOMigrationNNQuantized::test_import_nn_qat_conv, test/test_quantization.py::TestAOMigrationNNQuantized::test_import_nn_qat_dynamic_linear, test/test_quantization.py::TestAOMigrationNNQuantized::test_modules_functional_modules, test/test_quantization.py::TestAOMigrationNNQuantized::test_modules_normalization, test/test_quantization.py::TestAOMigrationNNIntrinsic::test_modules_intrinsic_qat_linear_fused, test/test_quantization.py::TestAOMigrationNNIntrinsic::test_modules_intrinsic_qat_linear_relu, test/test_quantization.py::TestAOMigrationQuantizationFx::test_function_import_fx, test/test_quantization.py::TestAOMigrationQuantizationFx::test_function_import_fx_quantization_patterns, test/test_quantization.py::TestAOMigrationQuantizationFx::test_function_import_fx_utils, test/test_quantization.py::TestFloat8DtypeCUDA::test_cast_round_trip_rte_cuda_float8_e5m2, test/test_quantization.py::TestFloat8DtypeCUDA::test_cast_round_trip_rte_cuda_float8_e8m0fnu, test/test_quantization.py::TestFloat8DtypeCUDA::test_creation_with_zeros_cuda_float8_e4m3fn, test/test_quantization.py::TestFloat8DtypeCUDA::test_empty_cuda_float8_e8m0fnu, test/test_quantization.py::TestFloat8DtypeCUDA::test_finfo_cuda_float8_e8m0fnu, test/test_quantization.py::TestFloat8DtypeCUDA::test_save_load_cuda_float8_e4m3fn, test/test_quantization.py::TestFloat8DtypeCUDA::test_save_load_cuda_float8_e5m2, test/test_quantization.py::TestFloat8DtypeCUDA::test_special_numbers_cuda_float8_e4m3fn 2025-09-07T09:30:35.3901776Z 2025-09-07T09:30:35.3901924Z Running test_stateless 1/1 ... [2025-09-07 09:30:35.381911] 2025-09-07T09:30:35.3902106Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:30:35.3902476Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_stateless.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 09:30:35.382173] 2025-09-07T09:30:41.8106642Z 2025-09-07T09:30:41.8112879Z test_stateless 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_stateless_1.1_6c0c47d57f8690ae_.log 2025-09-07T09:30:41.8125011Z Running 50 items in this shard: test/test_stateless.py::TestStatelessFunctionalAPI::test_circular_references_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_circular_references_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_batch_norm_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_batch_norm_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_member_reference_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_member_reference_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_multiple_dicts_error, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_tuple_dicts, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_data_parallel_error_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_data_parallel_error_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_data_parallel_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_data_parallel_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_gradient_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_gradient_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_jit_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_jit_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_kwargs_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_kwargs_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_in_place_operator_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_in_place_operator_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_module_fail_reset_to_original_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_module_fail_reset_to_original_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_some_weights_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_some_weights_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_special_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_special_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_strict_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_strict_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_tie_some_weights_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_tie_some_weights_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_tie_weights_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_tie_weights_strict_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_tie_weights_strict_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_tie_weights_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrized_module_change_parametrization_original_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrized_module_change_parametrization_original_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_setattr_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_setattr_strict_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_setattr_strict_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_setattr_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_tied_weights_errors_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_tied_weights_errors_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_tied_weights_no_error_without_flag, test/test_stateless.py::TestStatelessFunctionalAPI::test_tied_weights_warns_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_tied_weights_warns_torch_func, test/test_stateless.py::TestStatelessDeprecation::test_private_stateless_warns, test/test_stateless.py::TestStatelessDeprecation::test_stateless_functional_call_warns, test/test_stateless.py::TestPythonOptimizeMode::test_runs_with_optimize_flag 2025-09-07T09:30:41.8132830Z 2025-09-07T09:30:41.8132911Z Running test_sympy_utils 1/1 ... [2025-09-07 09:30:41.810498] 2025-09-07T09:30:41.8133081Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:30:41.8133489Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_sympy_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 09:30:41.810766] 2025-09-07T09:30:52.6566302Z 2025-09-07T09:30:52.6567611Z test_sympy_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_sympy_utils_1.1_2b6a2b79d1235c96_.log 2025-09-07T09:30:52.6594711Z Running 209 items in this shard: test/test_sympy_utils.py::TestNumbers::test_float_cast, test/test_sympy_utils.py::TestNumbers::test_int_infinity, test/test_sympy_utils.py::TestNumbers::test_lt_self, test/test_sympy_utils.py::TestNumbers::test_mixed_oo_int_oo, test/test_sympy_utils.py::TestNumbers::test_relation, test/test_sympy_utils.py::TestValueRanges::test_binary_bool_ref_range_fn_and_, test/test_sympy_utils.py::TestValueRanges::test_binary_bool_ref_range_fn_bitwise_and, test/test_sympy_utils.py::TestValueRanges::test_binary_bool_ref_range_fn_bitwise_or, test/test_sympy_utils.py::TestValueRanges::test_binary_bool_ref_range_fn_or_, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_add_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_add_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_bitwise_and_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_bitwise_and_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_bitwise_or_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_bitwise_or_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_floordiv_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_floordiv_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_maximum_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_maximum_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_minimum_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_minimum_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_mod_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_mod_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_mul_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_mul_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_pow_by_natural_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_pow_by_natural_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_pow_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_pow_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_sub_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_sub_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_truediv_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_truediv_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_add, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_bitwise_and, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_bitwise_or, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_eq, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_floordiv, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_ge, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_gt, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_le, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_lt, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_maximum, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_minimum, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_mod, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_mul, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_ne, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_pow, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_pow_by_natural, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_sub, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_truediv, test/test_sympy_utils.py::TestValueRanges::test_bitwise_ref_range_fn_bitwise_and, test/test_sympy_utils.py::TestValueRanges::test_bitwise_ref_range_fn_bitwise_or, test/test_sympy_utils.py::TestValueRanges::test_mul_zero_unknown, test/test_sympy_utils.py::TestValueRanges::test_pow_half, test/test_sympy_utils.py::TestValueRanges::test_unary_bool_ref_range_fn_not_, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_abs_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_abs_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_ceil_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_ceil_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_exp_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_exp_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_floor_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_floor_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_log_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_log_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_neg_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_neg_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_reciprocal_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_reciprocal_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_sqrt_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_sqrt_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_square_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_square_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_abs, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_ceil, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_exp, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_floor, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_log, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_neg, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_reciprocal, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_sqrt, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_square, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_abs, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_add, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_and_, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_bitwise_and, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_bitwise_or, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_ceil, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_eq, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_exp, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_floor, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_floordiv, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_ge, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_gt, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_le, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_log, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_lt, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_maximum, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_minimum, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_mod, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_mul, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_ne, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_neg, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_not_, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_or_, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_pow, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_pow_by_natural, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_reciprocal, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_sqrt, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_square, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_sub, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_truediv, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_abs, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_add, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_and_, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_bitwise_and, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_bitwise_or, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_ceil, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_eq, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_exp, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_floor, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_floordiv, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_ge, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_gt, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_le, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_log, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_lt, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_maximum, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_minimum, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_mod, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_mul, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_ne, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_neg, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_not_, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_or_, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_pow, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_pow_by_natural, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_reciprocal, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_sqrt, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_square, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_sub, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_truediv, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_abs, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_add, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_and_, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_bitwise_and, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_bitwise_or, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_ceil, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_eq, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_exp, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_floor, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_floordiv, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_ge, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_gt, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_le, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_log, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_lt, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_maximum, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_minimum, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_mod, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_mul, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_ne, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_neg, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_not_, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_or_, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_pow, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_pow_by_natural, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_reciprocal, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_sqrt, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_square, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_sub, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_truediv, test/test_sympy_utils.py::TestSympySolve::test_addition, test/test_sympy_utils.py::TestSympySolve::test_floordiv_Equality, test/test_sympy_utils.py::TestSympySolve::test_floordiv_GreaterThan, test/test_sympy_utils.py::TestSympySolve::test_floordiv_LessThan, test/test_sympy_utils.py::TestSympySolve::test_floordiv_StrictGreaterThan, test/test_sympy_utils.py::TestSympySolve::test_floordiv_StrictLessThan, test/test_sympy_utils.py::TestSympySolve::test_floordiv_Unequality, test/test_sympy_utils.py::TestSympySolve::test_floordiv_eq_simplify, test/test_sympy_utils.py::TestSympySolve::test_give_up, test/test_sympy_utils.py::TestSympySolve::test_multiplication_division_Equality, test/test_sympy_utils.py::TestSympySolve::test_multiplication_division_Unequality, test/test_sympy_utils.py::TestSympySolve::test_multiplication_division_inequality_GreaterThan, test/test_sympy_utils.py::TestSympySolve::test_multiplication_division_inequality_LessThan, test/test_sympy_utils.py::TestSympySolve::test_multiplication_division_inequality_StrictGreaterThan, test/test_sympy_utils.py::TestSympySolve::test_multiplication_division_inequality_StrictLessThan, test/test_sympy_utils.py::TestSympySolve::test_noop_Equality, test/test_sympy_utils.py::TestSympySolve::test_noop_GreaterThan, test/test_sympy_utils.py::TestSympySolve::test_noop_LessThan, test/test_sympy_utils.py::TestSympySolve::test_noop_StrictGreaterThan, test/test_sympy_utils.py::TestSympySolve::test_noop_StrictLessThan, test/test_sympy_utils.py::TestSympySolve::test_noop_Unequality, test/test_sympy_utils.py::TestSympySolve::test_noop_rhs_Equality, test/test_sympy_utils.py::TestSympySolve::test_noop_rhs_GreaterThan, test/test_sympy_utils.py::TestSympySolve::test_noop_rhs_LessThan, test/test_sympy_utils.py::TestSympySolve::test_noop_rhs_StrictGreaterThan, test/test_sympy_utils.py::TestSympySolve::test_noop_rhs_StrictLessThan, test/test_sympy_utils.py::TestSympySolve::test_noop_rhs_Unequality, test/test_sympy_utils.py::TestSympySolve::test_simple_floordiv_gcd, test/test_sympy_utils.py::TestSympySolve::test_z3_proof_floordiv_eq_simplify, test/test_sympy_utils.py::TestSympyFunctions::test_pickle, test/test_sympy_utils.py::TestSingletonInt::test_basic, test/test_sympy_utils.py::TestIdentity::test_cast_identity_float, test/test_sympy_utils.py::TestIdentity::test_cast_identity_illegal, test/test_sympy_utils.py::TestIdentity::test_cast_identity_int, test/test_sympy_utils.py::TestIdentity::test_expand_identity, test/test_sympy_utils.py::TestTypedExpr::test_typed_expr 2025-09-07T09:30:52.6619086Z 2025-09-07T09:30:52.6619187Z Running test_tensorboard 1/1 ... [2025-09-07 09:30:52.656782] 2025-09-07T09:30:52.6619424Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:30:52.6619832Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_tensorboard.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 09:30:52.657026] 2025-09-07T09:30:55.7811106Z 2025-09-07T09:30:55.7812291Z test_tensorboard 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_tensorboard_1.1_50092a6040c13544_.log 2025-09-07T09:30:55.7818193Z Running 50 items in this shard: test/test_tensorboard.py::TestTensorBoardPyTorchNumpy::test_pytorch_autograd_np, test/test_tensorboard.py::TestTensorBoardPyTorchNumpy::test_pytorch_histogram, test/test_tensorboard.py::TestTensorBoardPyTorchNumpy::test_pytorch_histogram_raw, test/test_tensorboard.py::TestTensorBoardPyTorchNumpy::test_pytorch_np, test/test_tensorboard.py::TestTensorBoardPyTorchNumpy::test_pytorch_write, test/test_tensorboard.py::TestTensorBoardUtils::test_convert_to_HWC_dtype_remains_same, test/test_tensorboard.py::TestTensorBoardUtils::test_numpy_vid_uint8, test/test_tensorboard.py::TestTensorBoardUtils::test_prepare_video, test/test_tensorboard.py::TestTensorBoardUtils::test_to_HWC, test/test_tensorboard.py::TestTensorBoardWriter::test_writer, test/test_tensorboard.py::TestTensorBoardSummaryWriter::test_pathlib, test/test_tensorboard.py::TestTensorBoardSummaryWriter::test_summary_writer_close, test/test_tensorboard.py::TestTensorBoardSummaryWriter::test_summary_writer_ctx, test/test_tensorboard.py::TestTensorBoardEmbedding::test_embedding, test/test_tensorboard.py::TestTensorBoardEmbedding::test_embedding_64, test/test_tensorboard.py::TestTensorBoardSummary::test_audio, test/test_tensorboard.py::TestTensorBoardSummary::test_custom_scalars, test/test_tensorboard.py::TestTensorBoardSummary::test_empty_input, test/test_tensorboard.py::TestTensorBoardSummary::test_float32_image, test/test_tensorboard.py::TestTensorBoardSummary::test_histogram_auto, test/test_tensorboard.py::TestTensorBoardSummary::test_histogram_doane, test/test_tensorboard.py::TestTensorBoardSummary::test_histogram_fd, test/test_tensorboard.py::TestTensorBoardSummary::test_image_with_3_channel_batched, test/test_tensorboard.py::TestTensorBoardSummary::test_image_with_boxes, test/test_tensorboard.py::TestTensorBoardSummary::test_image_with_one_channel, test/test_tensorboard.py::TestTensorBoardSummary::test_image_with_one_channel_batched, test/test_tensorboard.py::TestTensorBoardSummary::test_image_without_channel, test/test_tensorboard.py::TestTensorBoardSummary::test_list_input, test/test_tensorboard.py::TestTensorBoardSummary::test_mesh, test/test_tensorboard.py::TestTensorBoardSummary::test_scalar_new_style, test/test_tensorboard.py::TestTensorBoardSummary::test_text, test/test_tensorboard.py::TestTensorBoardSummary::test_uint8_image, test/test_tensorboard.py::TestTensorBoardSummary::test_video, test/test_tensorboard.py::TestTensorBoardPytorchGraph::test_mlp_graph, test/test_tensorboard.py::TestTensorBoardPytorchGraph::test_nested_nn_squential, test/test_tensorboard.py::TestTensorBoardPytorchGraph::test_pytorch_graph, test/test_tensorboard.py::TestTensorBoardPytorchGraph::test_pytorch_graph_dict_input, test/test_tensorboard.py::TestTensorBoardPytorchGraph::test_torchvision_smoke, test/test_tensorboard.py::TestTensorBoardPytorchGraph::test_wrong_input_size, test/test_tensorboard.py::TestTensorBoardFigure::test_figure, test/test_tensorboard.py::TestTensorBoardFigure::test_figure_list, test/test_tensorboard.py::TestTensorBoardNumpy::test_pytorch_np_expect_fail, test/test_tensorboard.py::TestTensorBoardNumpy::test_scalar, test/test_tensorboard.py::TestTensorProtoSummary::test_complex_tensor_proto, test/test_tensorboard.py::TestTensorProtoSummary::test_empty_tensor_proto, test/test_tensorboard.py::TestTensorProtoSummary::test_float_tensor_proto, test/test_tensorboard.py::TestTensorProtoSummary::test_half_tensor_proto_bfloat16_proto_type_14, test/test_tensorboard.py::TestTensorProtoSummary::test_half_tensor_proto_float16_proto_type_19, test/test_tensorboard.py::TestTensorProtoSummary::test_int_tensor_proto, test/test_tensorboard.py::TestTensorProtoSummary::test_scalar_tensor_proto 2025-09-07T09:30:55.7823429Z 2025-09-07T09:30:55.7823509Z Running test_tensorexpr 1/1 ... [2025-09-07 09:30:55.781054] 2025-09-07T09:30:55.7823668Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:30:55.7828559Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_tensorexpr.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 09:30:55.781285] 2025-09-07T09:31:23.7637845Z 2025-09-07T09:31:23.7638524Z test_tensorexpr 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_tensorexpr_1.1_a0066b5a1503e5f1_.log 2025-09-07T09:31:23.7645542Z Running 74 items in this shard: test/test_tensorexpr.py::TestTensorExprFuser::test_add_const_rhs, test/test_tensorexpr.py::TestTensorExprFuser::test_add_sub, test/test_tensorexpr.py::TestTensorExprFuser::test_alias_analysis_input_and_module, test/test_tensorexpr.py::TestTensorExprFuser::test_alias_analysis_inputs, test/test_tensorexpr.py::TestTensorExprFuser::test_alias_analysis_module, test/test_tensorexpr.py::TestTensorExprFuser::test_all_combos, test/test_tensorexpr.py::TestTensorExprFuser::test_alpha, test/test_tensorexpr.py::TestTensorExprFuser::test_binary_ops, test/test_tensorexpr.py::TestTensorExprFuser::test_bitwise_ops, test/test_tensorexpr.py::TestTensorExprFuser::test_broadcast, test/test_tensorexpr.py::TestTensorExprFuser::test_broadcast3, test/test_tensorexpr.py::TestTensorExprFuser::test_broadcast_2, test/test_tensorexpr.py::TestTensorExprFuser::test_broadcast_big2, test/test_tensorexpr.py::TestTensorExprFuser::test_cat, test/test_tensorexpr.py::TestTensorExprFuser::test_cat_empty_tensors, test/test_tensorexpr.py::TestTensorExprFuser::test_cat_negative_dim, test/test_tensorexpr.py::TestTensorExprFuser::test_cat_only, test/test_tensorexpr.py::TestTensorExprFuser::test_cat_promote_inputs, test/test_tensorexpr.py::TestTensorExprFuser::test_cat_with_constant_dim, test/test_tensorexpr.py::TestTensorExprFuser::test_char, test/test_tensorexpr.py::TestTensorExprFuser::test_chunk, test/test_tensorexpr.py::TestTensorExprFuser::test_clamp, test/test_tensorexpr.py::TestTensorExprFuser::test_constant, test/test_tensorexpr.py::TestTensorExprFuser::test_double, test/test_tensorexpr.py::TestTensorExprFuser::test_double_intrinsics, test/test_tensorexpr.py::TestTensorExprFuser::test_dynamic_shape, test/test_tensorexpr.py::TestTensorExprFuser::test_easy, test/test_tensorexpr.py::TestTensorExprFuser::test_eq, test/test_tensorexpr.py::TestTensorExprFuser::test_exp_pow, test/test_tensorexpr.py::TestTensorExprFuser::test_four_arg, test/test_tensorexpr.py::TestTensorExprFuser::test_ge, test/test_tensorexpr.py::TestTensorExprFuser::test_gt, test/test_tensorexpr.py::TestTensorExprFuser::test_guard_fails, test/test_tensorexpr.py::TestTensorExprFuser::test_half_bn_relu, test/test_tensorexpr.py::TestTensorExprFuser::test_half_gelu, test/test_tensorexpr.py::TestTensorExprFuser::test_int64_promotion, test/test_tensorexpr.py::TestTensorExprFuser::test_int_output, test/test_tensorexpr.py::TestTensorExprFuser::test_le, test/test_tensorexpr.py::TestTensorExprFuser::test_loop, test/test_tensorexpr.py::TestTensorExprFuser::test_lt, test/test_tensorexpr.py::TestTensorExprFuser::test_mask, test/test_tensorexpr.py::TestTensorExprFuser::test_min_max, test/test_tensorexpr.py::TestTensorExprFuser::test_min_max_reduction, test/test_tensorexpr.py::TestTensorExprFuser::test_min_max_reduction2, test/test_tensorexpr.py::TestTensorExprFuser::test_min_max_reduction_dim1, test/test_tensorexpr.py::TestTensorExprFuser::test_min_max_reduction_dim1_2, test/test_tensorexpr.py::TestTensorExprFuser::test_multi_rand, test/test_tensorexpr.py::TestTensorExprFuser::test_multioutput, test/test_tensorexpr.py::TestTensorExprFuser::test_multiple_outputs, test/test_tensorexpr.py::TestTensorExprFuser::test_nans, test/test_tensorexpr.py::TestTensorExprFuser::test_ne, test/test_tensorexpr.py::TestTensorExprFuser::test_promotion, test/test_tensorexpr.py::TestTensorExprFuser::test_propagated_mem_layout, test/test_tensorexpr.py::TestTensorExprFuser::test_rand_like, test/test_tensorexpr.py::TestTensorExprFuser::test_rank_two, test/test_tensorexpr.py::TestTensorExprFuser::test_relu, test/test_tensorexpr.py::TestTensorExprFuser::test_remainder, test/test_tensorexpr.py::TestTensorExprFuser::test_reps, test/test_tensorexpr.py::TestTensorExprFuser::test_round_2, test/test_tensorexpr.py::TestTensorExprFuser::test_scalar, test/test_tensorexpr.py::TestTensorExprFuser::test_short, test/test_tensorexpr.py::TestTensorExprFuser::test_simple_add, test/test_tensorexpr.py::TestTensorExprFuser::test_sin_pow, test/test_tensorexpr.py::TestTensorExprFuser::test_slice, test/test_tensorexpr.py::TestTensorExprFuser::test_sliced_stride, test/test_tensorexpr.py::TestTensorExprFuser::test_softmax_cpu, test/test_tensorexpr.py::TestTensorExprFuser::test_softmax_cuda, test/test_tensorexpr.py::TestTensorExprFuser::test_strided_output_preserved, test/test_tensorexpr.py::TestTensorExprFuser::test_three_arg, test/test_tensorexpr.py::TestTensorExprFuser::test_three_arg2, test/test_tensorexpr.py::TestTensorExprFuser::test_transpose, test/test_tensorexpr.py::TestTensorExprFuser::test_unary_ops, test/test_tensorexpr.py::TestTensorExprFuser::test_unsqueeze, test/test_tensorexpr.py::TestTensorExprFuser::test_where 2025-09-07T09:31:23.7652230Z 2025-09-07T09:31:23.7652313Z Running test_tensorexpr_pybind 1/1 ... [2025-09-07 09:31:23.762996] 2025-09-07T09:31:23.7652479Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:31:23.7652862Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_tensorexpr_pybind.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 09:31:23.763201] 2025-09-07T09:31:25.8816771Z 2025-09-07T09:31:25.8823040Z test_tensorexpr_pybind 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_tensorexpr_pybind_1.1_d85062515d65eca4_.log 2025-09-07T09:31:25.8825693Z Running 17 items in this shard: test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_alloc_in_loop, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_call_raw, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_dtype_error, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_dynamic_shape, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_dynamic_shape_2d, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_external_calls, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_shape_prop, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_shape_prop_module, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_with_custom_lowering, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_with_expand, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_with_permute, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_with_scalar_inputs, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_with_t, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_with_tensor_inputs, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_kernel_with_transpose, test/test_tensorexpr_pybind.py::TestTensorExprPyBind::test_simple_sum, test/test_tensorexpr_pybind.py::TestExprHandlePyBind::test_unary_ops 2025-09-07T09:31:25.8827801Z 2025-09-07T09:31:25.8827876Z Running test_testing 1/1 ... [2025-09-07 09:31:25.881622] 2025-09-07T09:31:25.8828025Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:31:25.8828454Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_testing.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 09:31:25.881972] 2025-09-07T09:31:51.4001717Z 2025-09-07T09:31:51.4002500Z test_testing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_testing_1.1_d881a813b62204d8_.log 2025-09-07T09:31:51.4293979Z Running 2073 items in this shard: test/test_testing.py::TestTestingCUDA::test_assertEqual_longMessage_cuda, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_bool, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_complex128, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_complex64, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_float16, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_float32, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_float64, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_int16, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_int32, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_int64, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_int8, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_uint8, test/test_testing.py::TestTestingCUDA::test_cuda_assert_should_not_stop_common_distributed_test_suite_cuda, test/test_testing.py::TestTestingCUDA::test_cuda_assert_should_stop_common_device_type_test_suite_cuda, test/test_testing.py::TestTestingCUDA::test_cuda_assert_should_stop_common_utils_test_suite_cuda, test/test_testing.py::TestTestingCUDA::test_get_supported_dtypes_cuda, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_bool, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_float16, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_float32, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_float64, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_int16, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_int32, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_int64, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_int8, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_uint8, test/test_testing.py::TestTestingCUDA::test_isclose_bool_cuda, test/test_testing.py::TestTestingCUDA::test_isclose_complex_cuda_complex128, test/test_testing.py::TestTestingCUDA::test_isclose_complex_cuda_complex64, test/test_testing.py::TestTestingCUDA::test_isclose_equality_shortcut_cuda, test/test_testing.py::TestTestingCUDA::test_isclose_float_cuda_float16, test/test_testing.py::TestTestingCUDA::test_isclose_float_cuda_float32, test/test_testing.py::TestTestingCUDA::test_isclose_float_cuda_float64, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_int16, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_int32, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_int64, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_int8, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_uint8, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_complex128, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_complex64, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_float16, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_float32, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_float64, test/test_testing.py::TestTestingCUDA::test_setup_and_teardown_run_for_device_specific_tests_cuda, test/test_testing.py::TestTestingCUDA::test_supported_dtypes_abs_cuda, test/test_testing.py::TestFrameworkUtils::test_filtering_env_var, test/test_testing.py::TestAssertClose::test_bool, test/test_testing.py::TestAssertClose::test_default_tolerance_selection_mismatching_dtypes, test/test_testing.py::TestAssertClose::test_docstring_examples, test/test_testing.py::TestAssertClose::test_matching, test/test_testing.py::TestAssertClose::test_matching_atol, test/test_testing.py::TestAssertClose::test_matching_conjugate_bit, test/test_testing.py::TestAssertClose::test_matching_nan, test/test_testing.py::TestAssertClose::test_matching_nan_with_equal_nan, test/test_testing.py::TestAssertClose::test_matching_rtol, test/test_testing.py::TestAssertClose::test_meta, test/test_testing.py::TestAssertClose::test_mismatching_dtype, test/test_testing.py::TestAssertClose::test_mismatching_dtype_no_check, test/test_testing.py::TestAssertClose::test_mismatching_layout, test/test_testing.py::TestAssertClose::test_mismatching_layout_no_check, test/test_testing.py::TestAssertClose::test_mismatching_shape, test/test_testing.py::TestAssertClose::test_mismatching_stride, test/test_testing.py::TestAssertClose::test_mismatching_stride_no_check, test/test_testing.py::TestAssertClose::test_mismatching_types, test/test_testing.py::TestAssertClose::test_mismatching_types_subclasses, test/test_testing.py::TestAssertClose::test_mismatching_types_type_equality, test/test_testing.py::TestAssertClose::test_mismatching_values, test/test_testing.py::TestAssertClose::test_mismatching_values_atol, test/test_testing.py::TestAssertClose::test_mismatching_values_rtol, test/test_testing.py::TestAssertClose::test_none, test/test_testing.py::TestAssertClose::test_none_mismatch, test/test_testing.py::TestAssertClose::test_numpy, test/test_testing.py::TestAssertClose::test_only_atol, test/test_testing.py::TestAssertClose::test_only_rtol, test/test_testing.py::TestAssertClose::test_scalar, test/test_testing.py::TestAssertClose::test_unexpected_error_compare, test/test_testing.py::TestAssertClose::test_unexpected_error_originate, test/test_testing.py::TestAssertClose::test_unknown_layout, test/test_testing.py::TestAssertClose::test_unknown_type, test/test_testing.py::TestAssertCloseMultiDeviceCUDA::test_mismatching_device_cuda, test/test_testing.py::TestAssertCloseMultiDeviceCUDA::test_mismatching_device_no_check_cuda, test/test_testing.py::TestAssertCloseErrorMessage::test_abs_diff, test/test_testing.py::TestAssertCloseErrorMessage::test_abs_diff_scalar, test/test_testing.py::TestAssertCloseErrorMessage::test_atol, test/test_testing.py::TestAssertCloseErrorMessage::test_identifier_scalars, test/test_testing.py::TestAssertCloseErrorMessage::test_identifier_tensor_likes, test/test_testing.py::TestAssertCloseErrorMessage::test_mismatched_elements, test/test_testing.py::TestAssertCloseErrorMessage::test_msg_callable, test/test_testing.py::TestAssertCloseErrorMessage::test_msg_str, test/test_testing.py::TestAssertCloseErrorMessage::test_not_close, test/test_testing.py::TestAssertCloseErrorMessage::test_not_equal, test/test_testing.py::TestAssertCloseErrorMessage::test_rel_diff, test/test_testing.py::TestAssertCloseErrorMessage::test_rel_diff_scalar, test/test_testing.py::TestAssertCloseErrorMessage::test_rtol, test/test_testing.py::TestAssertCloseErrorMessage::test_small_float_dtype, test/test_testing.py::TestAssertCloseErrorMessage::test_zero_div_zero, test/test_testing.py::TestAssertCloseContainer::test_mapping_mismatching_keys, test/test_testing.py::TestAssertCloseContainer::test_mapping_mismatching_values_msg, test/test_testing.py::TestAssertCloseContainer::test_sequence_mismatching_len, test/test_testing.py::TestAssertCloseContainer::test_sequence_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseCOO::test_matching_coalesced, test/test_testing.py::TestAssertCloseSparseCOO::test_matching_uncoalesced, test/test_testing.py::TestAssertCloseSparseCOO::test_mismatching_indices_msg, test/test_testing.py::TestAssertCloseSparseCOO::test_mismatching_nnz, test/test_testing.py::TestAssertCloseSparseCOO::test_mismatching_sparse_dims, test/test_testing.py::TestAssertCloseSparseCOO::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseCSR::test_matching, test/test_testing.py::TestAssertCloseSparseCSR::test_mismatching_col_indices_msg, test/test_testing.py::TestAssertCloseSparseCSR::test_mismatching_crow_indices_msg, test/test_testing.py::TestAssertCloseSparseCSR::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseCSC::test_matching, test/test_testing.py::TestAssertCloseSparseCSC::test_mismatching_ccol_indices_msg, test/test_testing.py::TestAssertCloseSparseCSC::test_mismatching_row_indices_msg, test/test_testing.py::TestAssertCloseSparseCSC::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseBSR::test_matching, test/test_testing.py::TestAssertCloseSparseBSR::test_mismatching_col_indices_msg, test/test_testing.py::TestAssertCloseSparseBSR::test_mismatching_crow_indices_msg, test/test_testing.py::TestAssertCloseSparseBSR::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseBSC::test_matching, test/test_testing.py::TestAssertCloseSparseBSC::test_mismatching_ccol_indices_msg, test/test_testing.py::TestAssertCloseSparseBSC::test_mismatching_row_indices_msg, test/test_testing.py::TestAssertCloseSparseBSC::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseQuantized::test_matching_per_channel, test/test_testing.py::TestAssertCloseQuantized::test_matching_per_tensor, test/test_testing.py::TestAssertCloseQuantized::test_mismatching_is_quantized, test/test_testing.py::TestAssertCloseQuantized::test_mismatching_qscheme, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_uint8, test/test_testing.py::TestTestParametrization::test_apply_param_specific_decorators, test/test_testing.py::TestTestParametrization::test_compose_param_specific_decorators, test/test_testing.py::TestTestParametrization::test_default_names, test/test_testing.py::TestTestParametrization::test_modules_decorator_misuse_error, test/test_testing.py::TestTestParametrization::test_multiple_handling_of_same_param_error, test/test_testing.py::TestTestParametrization::test_name_fn, test/test_testing.py::TestTestParametrization::test_ops_decorator_misuse_error, test/test_testing.py::TestTestParametrization::test_reparametrize, test/test_testing.py::TestTestParametrization::test_subtest_expected_failure_x_1, test/test_testing.py::TestTestParametrization::test_subtest_expected_failure_x_2, test/test_testing.py::TestTestParametrization::test_subtest_expected_failure_x_3, test/test_testing.py::TestTestParametrization::test_subtest_names, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_1_y_4, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_1_y_5, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_1_y_6, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_2_y_4, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_2_y_5, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_2_y_6, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_3_y_4, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_3_y_5, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_3_y_6, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_default_name_non_primitive_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_default_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_dtypes_composition_invalid_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_dtypes_composition_valid_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_empty_param_list_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_empty_param_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_modules_composition_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_modules_decorator_applies_module_and_param_specific_decorators_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_multiple_handling_of_same_param_error_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_name_fn_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_ops_composition_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_ops_decorator_applies_op_and_param_specific_decorators_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_param_specific_decoration_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_subtest_expected_failure_x_1_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_subtest_expected_failure_x_2_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_subtest_expected_failure_x_3_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_subtest_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_1_y_4_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_1_y_5_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_1_y_6_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_2_y_4_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_2_y_5_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_2_y_6_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_3_y_4_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_3_y_5_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_3_y_6_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_unparametrized_names_cuda, test/test_testing.py::TestImports::test_circular_dependencies, test/test_testing.py::TestImports::test_lazy_imports_are_lazy, test/test_testing.py::TestImports::test_no_mutate_global_logging_on_import_path_functorch, test/test_testing.py::TestImports::test_no_mutate_global_logging_on_import_path_torch, test/test_testing.py::TestImports::test_no_warning_on_import, test/test_testing.py::TestImports::test_not_import_sympy, test/test_testing.py::TestOpInfos::test_sample_input, test/test_testing.py::TestOpInfos::test_sample_input_metadata, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_T_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___radd___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rand___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rdiv___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rmod___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rmul___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___ror___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rpow___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rsub___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rxor___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators__chunk_cat_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_add_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_amax_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_amin_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_aminmax_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_arange_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_as_strided_scatter_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_atan2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bernoulli_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_and_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_left_shift_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_or_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_right_shift_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_xor_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bucketize_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_cat_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_cauchy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_clamp_max_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_clamp_min_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_complex_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_copysign_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_cov_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diag_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diag_embed_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diagonal_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diagonal_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diff_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_div_floor_rounding_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_div_no_rounding_mode_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_div_trunc_rounding_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_dot_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_dsplit_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_dstack_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_empty_permuted_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_eq_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_exponential_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_eye_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_fft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_fft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_fftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_hfft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_hfft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_hfftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ifft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ifft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ifftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ihfft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ihfft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ihfftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_irfft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_irfft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_irfftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_rfft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_rfft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_rfftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fliplr_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_flipud_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_float_power_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_floor_divide_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fmax_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fmin_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fmod_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_gather_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_gcd_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_ge_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_geometric_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_gradient_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_gt_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_heaviside_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_histogramdd_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_hsplit_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_hstack_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_hypot_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_igamma_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_igammac_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_index_add_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_index_select_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_isclose_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_item_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_jiterator_binary_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_jiterator_binary_return_by_ref_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_kthvalue_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_lcm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_ldexp_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_le_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linalg_cross_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linalg_diagonal_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linalg_lstsq_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linalg_lstsq_grad_oriented_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linspace_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linspace_tensor_overload_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_log_normal_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logaddexp_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logcumsumexp_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logical_and_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logical_or_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logical_xor_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logspace_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logspace_tensor_overload_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_lt_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_masked_fill_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_masked_scatter_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_masked_select_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_max_binary_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_maximum_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_mean_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_median_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_min_binary_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_minimum_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_movedim_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_mul_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_multinomial_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_narrow_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_narrow_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_native_layer_norm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_ne_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_neg_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nextafter_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_avg_pool1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_avg_pool2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_avg_pool3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_max_pool1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_max_pool2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_max_pool3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_avg_pool1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_avg_pool2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_avg_pool3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_conv1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_conv2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_conv3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_embedding_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_gaussian_nll_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_gelu_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_group_norm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_hardtanh_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_hinge_embedding_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_huber_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_l1_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_margin_ranking_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_max_pool1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_max_pool2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_max_pool3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_multi_margin_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_multilabel_margin_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_poisson_nll_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_prelu_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_rms_norm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_rrelu_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_soft_margin_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_softshrink_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_triplet_margin_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_triplet_margin_with_distance_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_normal_in_place_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_ormqr_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_polar_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_pow_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_remainder_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_renorm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_reshape_as_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_reshape_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_roll_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_rot90_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_rsub_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_scatter_add_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_scatter_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_bartlett_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_blackman_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_cosine_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_exponential_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_gaussian_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_general_cosine_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_general_hamming_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_hamming_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_hann_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_kaiser_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_nuttall_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_chebyshev_polynomial_t_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_chebyshev_polynomial_u_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_chebyshev_polynomial_v_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_chebyshev_polynomial_w_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_hermite_polynomial_h_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_hermite_polynomial_he_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_laguerre_polynomial_l_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_legendre_polynomial_p_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_shifted_chebyshev_polynomial_t_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_shifted_chebyshev_polynomial_u_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_shifted_chebyshev_polynomial_v_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_shifted_chebyshev_polynomial_w_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_xlog1py_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_zeta_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_sub_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_sum_to_size_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_t_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_t_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_take_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_trace_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_tril_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_triu_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_true_divide_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_unbind_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_unbind_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_uniform_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_vdot_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_view_as_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_view_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_view_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_vsplit_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_vstack_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_where_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_xlogy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___radd___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rand___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rdiv___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rmod___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rmul___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___ror___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rpow___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rsub___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rxor___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_abs_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_acos_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_acosh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_addcdiv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_addcmul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_angle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_asin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_asinh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_atan2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_atan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_atanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bfloat16_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_and_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_left_shift_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_not_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_or_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_right_shift_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_xor_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bool_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_broadcast_tensors_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bucketize_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_byte_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cdouble_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_ceil_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cfloat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_chalf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_char_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_chunk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_clamp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_clamp_max_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_clamp_min_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_clone_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_complex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_conj_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_conj_physical_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_contiguous_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_copysign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cos_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cosh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_deg2rad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_diag_embed_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_diagonal_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_diagonal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_digamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_div_floor_rounding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_div_no_rounding_mode_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_div_trunc_rounding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_double_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_empty_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_eq_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_erf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_erfc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_erfinv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_exp2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_exp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_expm1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_flatten_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_float_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_float_power_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_floor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_floor_divide_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_fmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_fmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_fmod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_frac_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_frexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_gcd_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_ge_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_gt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_half_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_heaviside_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_hypot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_i0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_igamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_igammac_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_imag_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_index_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_index_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_index_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_index_select_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_int_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isclose_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isfinite_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isinf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isnan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isneginf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isposinf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isreal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_jiterator_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_jiterator_binary_return_by_ref_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_jiterator_unary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_lcm_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_ldexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_le_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_lgamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_log10_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_log1p_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_log2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_log_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logaddexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logical_and_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logical_not_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logical_or_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logical_xor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logsumexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_long_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_lt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_max_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_maximum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_min_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_minimum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_movedim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_mul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nan_to_num_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_narrow_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_narrow_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_ne_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_neg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nextafter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_celu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_elu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_grid_sample_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_group_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_hardshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_hardsigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_hardtanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_hinge_embedding_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_interpolate_bicubic_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_interpolate_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_logsigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_margin_ranking_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_mish_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_multi_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_multilabel_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_prelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_relu6_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_relu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_rrelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_selu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_silu_complex_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_silu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_softplus_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_softshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_softsign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_tanhshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_threshold_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_upsample_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_permute_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_permute_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polar_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_4_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_positive_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_pow_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_rad2deg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_real_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_reciprocal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_remainder_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_reshape_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_reshape_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_round_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_round_decimals_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_round_decimals_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_round_decimals_neg_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_rsqrt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_rsub_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sgn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_short_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_bartlett_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_blackman_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_cosine_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_exponential_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_gaussian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_general_cosine_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_general_hamming_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_hamming_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_hann_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_kaiser_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_nuttall_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signbit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sinc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sinh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_airy_ai_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_bessel_j0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_bessel_j1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_bessel_y0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_bessel_y1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_chebyshev_polynomial_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_chebyshev_polynomial_u_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_chebyshev_polynomial_v_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_chebyshev_polynomial_w_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_entr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_erfcx_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_hermite_polynomial_h_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_hermite_polynomial_he_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_i0e_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_i1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_i1e_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_laguerre_polynomial_l_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_legendre_polynomial_p_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_log_ndtr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_modified_bessel_i0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_modified_bessel_i1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_modified_bessel_k0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_modified_bessel_k1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_ndtr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_ndtri_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_scaled_modified_bessel_k0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_scaled_modified_bessel_k1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_spherical_bessel_j0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_xlog1py_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_zeta_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sqrt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_square_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sub_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_tan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_tanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_true_divide_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_trunc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_unsafe_chunk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_view_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_view_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_where_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_xlogy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_H_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_T_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___getitem___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___radd___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rand___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rdiv___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rmatmul___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rmod___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rmul___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___ror___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rpow___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rsub___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rxor___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__batch_norm_with_update_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__chunk_cat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__native_batch_norm_legit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__segment_reduce_lengths_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__segment_reduce_offsets_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__softmax_backward_data_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__unsafe_masked_index_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__unsafe_masked_index_put_accumulate_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__upsample_bilinear2d_aa_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_abs_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_acos_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_acosh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addbmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addcdiv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addcmul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addmm_decomposed_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addmv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_alias_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_all_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_allclose_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_amax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_amin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_aminmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_angle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_any_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_arange_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_argmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_argmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_argsort_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_argwhere_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_as_strided_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_as_strided_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_as_strided_partial_views_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_as_strided_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_asin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_asinh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atan2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atleast_1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atleast_2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atleast_3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_baddbmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bernoulli_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bfloat16_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bincount_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_and_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_left_shift_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_not_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_or_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_right_shift_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_xor_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_block_diag_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bool_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_broadcast_shapes_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_broadcast_tensors_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_broadcast_to_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bucketize_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_byte_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cartesian_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cauchy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cdist_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cdouble_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ceil_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cfloat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_chalf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_char_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cholesky_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cholesky_inverse_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cholesky_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_chunk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_clamp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_clamp_max_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_clamp_min_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_clone_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_column_stack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_combinations_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_complex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_conj_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_conj_physical_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_constant_pad_nd_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_contiguous_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_copysign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_corrcoef_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cos_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cosh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_count_nonzero_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cov_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cross_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cummax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cummin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cumprod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cumsum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cumulative_trapezoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_deg2rad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diag_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diag_embed_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diagflat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diagonal_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diagonal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diagonal_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diff_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_digamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_dist_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_div_floor_rounding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_div_no_rounding_mode_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_div_trunc_rounding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_dot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_double_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_dsplit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_dstack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_einsum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_empty_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_empty_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_empty_permuted_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_empty_strided_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_eq_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_equal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_erf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_erfc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_erfinv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_exp2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_exp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_expand_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_expand_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_expand_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_expm1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_exponential_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_eye_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_fft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_fft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_fftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_fftshift_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_hfft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_hfft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_hfftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ifft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ifft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ifftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ifftshift_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ihfft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ihfft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ihfftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_irfft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_irfft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_irfftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_rfft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_rfft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_rfftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_flatten_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_flip_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fliplr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_flipud_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_float_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_float_power_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_floor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_floor_divide_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fmod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_frac_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_frexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_full_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_full_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_gather_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_gcd_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ge_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_geometric_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_geqrf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_gradient_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_grid_sampler_2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_grid_sampler_3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_gt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_half_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_hash_tensor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_heaviside_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_histc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_hsplit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_hstack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_hypot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_i0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_igamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_igammac_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_imag_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_put_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_reduce_amax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_reduce_amin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_reduce_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_reduce_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_select_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_inner_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_int_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isclose_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isfinite_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isinf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isnan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isneginf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isposinf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isreal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_istft_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_item_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_2inputs_2outputs_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_4inputs_with_extra_args_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_binary_return_by_ref_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_unary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_kron_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_kthvalue_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lcm_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ldexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_le_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lerp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lgamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_cholesky_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_cholesky_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_cond_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_cross_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_det_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_diagonal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_eig_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_eigh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_eigvals_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_eigvalsh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_householder_product_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_inv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_inv_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_ldl_factor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_ldl_factor_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_ldl_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lstsq_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lstsq_grad_oriented_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lu_factor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lu_factor_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lu_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_matrix_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_matrix_power_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_matrix_rank_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_matrix_rank_hermitian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_multi_dot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_norm_subgradients_at_zero_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_pinv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_pinv_hermitian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_pinv_singular_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_qr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_slogdet_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_solve_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_solve_triangular_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_svd_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_svdvals_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_tensorinv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_tensorsolve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_vander_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_vecdot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_vector_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linspace_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linspace_tensor_overload_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log10_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log1p_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log_normal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log_softmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log_softmax_with_dtype_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logaddexp2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logaddexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logcumsumexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logdet_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logical_and_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logical_not_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logical_or_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logical_xor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logspace_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logspace_tensor_overload_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logsumexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_long_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lu_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lu_unpack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mH_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mT_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_amax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_amin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_argmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_argmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_cumprod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_cumsum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_log_softmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_logaddexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_logsumexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_median_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_normalize_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_select_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_softmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_softmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_std_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_sum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_var_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_matmul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_matrix_exp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_max_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_max_pool2d_with_indices_backward_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_max_reduction_no_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_max_reduction_with_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_maximum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_median_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_meshgrid_list_of_tensors_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_meshgrid_variadic_tensors_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_min_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_min_reduction_no_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_min_reduction_with_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_minimum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mode_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_movedim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_msort_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_multinomial_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nan_to_num_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nanmean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nanmedian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nanquantile_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nansum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_narrow_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_narrow_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_native_batch_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_native_dropout_backward_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_native_layer_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ne_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_neg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_empty_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_empty_strided_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_full_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_ones_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_zeros_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nextafter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_alpha_dropout_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_avg_pool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_avg_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_avg_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_batch_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_binary_cross_entropy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_celu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_channel_shuffle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv_transpose1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv_transpose2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv_transpose3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_cosine_embedding_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_cosine_similarity_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_cross_entropy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_ctc_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_dropout2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_dropout3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_dropout_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_elu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_embedding_bag_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_embedding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_fractional_max_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_fractional_max_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_gaussian_nll_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_gelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_glu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_grid_sample_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_group_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hardshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hardsigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hardswish_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hardtanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hinge_embedding_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_huber_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_instance_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_area_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_bicubic_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_linear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_nearest_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_trilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_kl_div_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_l1_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_layer_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_leaky_relu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_linear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_local_response_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_logsigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_margin_ranking_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_pool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool1d_grad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool2d_grad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool3d_grad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_mish_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_mse_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_multi_head_attention_forward_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_multi_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_multilabel_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_nll_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_normalize_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_one_hot_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_circular_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_constant_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_reflect_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_replicate_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_replicate_negative_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pairwise_distance_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pdist_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pixel_shuffle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pixel_unshuffle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_poisson_nll_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_prelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_relu6_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_relu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_rms_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_rrelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_selu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_silu_complex_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_silu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_smooth_l1_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_soft_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softmin_with_dtype_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softplus_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softsign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_tanhshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_threshold_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_triplet_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_unfold_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_upsample_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_upsample_nearest_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nonzero_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nonzero_static_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_norm_fro_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_norm_inf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_norm_nuc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_normal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_normal_in_place_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_normal_number_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ones_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ones_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ormqr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_outer_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_pca_lowrank_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_permute_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_permute_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_pinverse_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polar_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_4_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_positive_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_pow_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_put_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_qr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_quantile_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rad2deg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rand_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_randint_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_randint_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_randn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_randn_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ravel_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_real_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_reciprocal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_remainder_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_renorm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_repeat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_repeat_interleave_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_reshape_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_reshape_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_resize__cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_resize_as__cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_resolve_conj_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_resolve_neg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_roll_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rot90_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_round_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_round_decimals_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_round_decimals_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_round_decimals_neg_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rsqrt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rsub_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scalar_tensor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_amax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_amin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_sum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_searchsorted_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_select_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_select_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sgn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_short_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_bartlett_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_blackman_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_cosine_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_exponential_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_gaussian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_general_cosine_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_general_hamming_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_hamming_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_hann_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_kaiser_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_nuttall_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signbit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sinc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sinh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_slice_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_slice_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_softmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_softmax_with_dtype_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sort_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sparse_mm_reduce_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sparse_sampled_addmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_airy_ai_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_bessel_j0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_bessel_j1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_bessel_y0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_bessel_y1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_chebyshev_polynomial_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_chebyshev_polynomial_u_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_chebyshev_polynomial_v_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_chebyshev_polynomial_w_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_entr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_erfcx_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_hermite_polynomial_h_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_hermite_polynomial_he_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_i0e_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_i1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_i1e_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_laguerre_polynomial_l_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_legendre_polynomial_p_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_log_ndtr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_modified_bessel_i0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_modified_bessel_i1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_modified_bessel_k0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_modified_bessel_k1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_ndtr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_ndtri_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_scaled_modified_bessel_k0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_scaled_modified_bessel_k1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_spherical_bessel_j0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_xlog1py_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_zeta_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_split_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_split_list_args_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_split_with_sizes_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_split_with_sizes_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sqrt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_square_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_squeeze_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_squeeze_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_squeeze_multiple_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_stack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_std_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_std_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_std_mean_unbiased_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_std_unbiased_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_stft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sub_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sum_to_size_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_svd_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_svd_lowrank_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_t_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_take_along_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_take_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tensor_split_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tensordot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tile_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_to_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_to_sparse_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_topk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch__scaled_mm_cuda_float8_e4m3fn, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_trace_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_transpose_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_transpose_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_trapezoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_trapz_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_triangular_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tril_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tril_indices_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_triu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_triu_indices_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_true_divide_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_trunc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unbind_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unbind_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unflatten_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unfold_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unfold_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_uniform_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unique_consecutive_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unique_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unravel_index_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unsafe_chunk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unsafe_split_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unsqueeze_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unsqueeze_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_var_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_var_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_var_mean_unbiased_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_var_unbiased_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_vdot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_as_complex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_as_real_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_vsplit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_vstack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_where_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_xlogy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_zero__cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_zeros_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_zeros_like_cuda_float32 2025-09-07T09:31:51.4571778Z 2025-09-07T09:31:51.4571863Z Running test_transformers 1/1 ... [2025-09-07 09:31:51.401921] 2025-09-07T09:31:51.4572023Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-09-07T09:31:51.4572450Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_transformers.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-09-07 09:31:51.402136] 2025-09-07T09:34:59.9162539Z 2025-09-07T09:34:59.9163239Z PRINTING LOG FILE of test_transformers 1/1 (test/test-reports/test_transformers_1.1_62cb5af01563119b_.log) 2025-09-07T09:34:59.9164039Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T09:34:59.9164660Z import pkg_resources 2025-09-07T09:34:59.9170865Z Test results will be stored in test-reports/python-pytest/test_transformers/test_transformers-8c1d8beb2ffa9920.xml 2025-09-07T09:34:59.9171456Z ============================= test session starts ============================== 2025-09-07T09:34:59.9171756Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T09:34:59.9171963Z cachedir: .pytest_cache 2025-09-07T09:34:59.9172200Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T09:34:59.9172456Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T09:34:59.9172797Z configfile: pytest.ini 2025-09-07T09:34:59.9173926Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T09:34:59.9174315Z collecting ... collected 12244 items 2025-09-07T09:34:59.9174511Z stepcurrent: Cannot find last run test, not skipping 2025-09-07T09:35:00.2555260Z Running 12244 items in this shard: test/test_transformers.py::TestTransformersCUDA::test_bias_is_none_cuda, test/test_transformers.py::TestTransformersCUDA::test_decoder_only_layer_cuda, test/test_transformers.py::TestTransformersCUDA::test_decoder_padding_and_src_mask_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_disable_fastpath_cuda, test/test_transformers.py::TestTransformersCUDA::test_encoder_is_causal_cuda, test/test_transformers.py::TestTransformersCUDA::test_encoder_padding_and_src_mask_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_is_causal_gpu_cuda, test/test_transformers.py::TestTransformersCUDA::test_kpm_mask_trailing_column_with_nested_tensor_cuda, test/test_transformers.py::TestTransformersCUDA::test_mask_check_fastpath_cuda, test/test_transformers.py::TestTransformersCUDA::test_math_backend_high_precision_cuda, test/test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_1_bias_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_1_bias_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_8_bias_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_8_bias_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim1_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim1_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim_2_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim_2_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim1_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim1_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim_2_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim_2_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim1_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim1_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim_2_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim_2_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_no_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_no_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_no_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_no_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_no_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_no_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_script_encoder_subclass_cuda, test/test_transformers.py::TestTransformersCUDA::test_script_mha_in_proj_weight_none_cuda, test/test_transformers.py::TestTransformersCUDA::test_self_attn_TxT_attn_mask_cuda, test/test_transformers.py::TestTransformersCUDA::test_train_with_is_causal_cuda, test/test_transformers.py::TestTransformersCUDA::test_train_with_pad_and_catch_error_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformer_bias_is_none_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_False_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_False_enable_nested_tensor_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_True_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_True_enable_nested_tensor_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_False_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_False_enable_nested_tensor_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_True_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_True_enable_nested_tensor_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_False_d_model_12_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_False_d_model_256_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_True_d_model_12_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_True_d_model_256_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_False_d_model_12_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_False_d_model_256_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_True_d_model_12_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_True_d_model_256_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_False_training_False_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_False_training_True_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_True_training_False_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_True_training_True_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_no_fastpath_with_hooks_nhead_3_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_no_fastpath_with_hooks_nhead_4_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_src_mask_nhead_1_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_src_mask_nhead_4_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_src_mask_nhead_8_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_subclass_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_subclass_model_cuda, test/test_transformers.py::TestTransformersCUDA::test_with_nested_tensor_input_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_dispatch_fails_no_backend_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_atteention_large_bf16_nan_values_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_attention_fail_with_non_square_causal_attention_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_autocast_fp32_bfloat16_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_autocast_fp32_float16_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_193_dropout_p_0_0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_193_dropout_p_0_2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_256_dropout_p_0_0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_256_dropout_p_0_2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_fail_fp32_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_nested_broadcasting_error_cases_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_nested_broadcasting_requires_grad_failure_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_seq_len_0_inputs_fused_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_seq_len_0_inputs_fused_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_attn_mask_present_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_broadcast_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_broadcast_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_dim_3_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_dim_3_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_head_dim_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_head_dim_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_invalid_dtype_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_invalid_dtype_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_1_dimensional_inputs_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_1_dimensional_inputs_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_1_dimensional_inputs_kernel2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_datatypes_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_datatypes_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_datatypes_kernel2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_devices_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_devices_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_devices_kernel2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_last_dim_stride_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_last_dim_stride_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_sdpa_kernel_grouped_query_attention_cuda_fused_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_sequence_lengths_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_sequence_lengths_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mask_invalid_last_dim_stride_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mask_invalid_last_dim_stride_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mem_eff_attention_fail_with_batch_size_geq_65536_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mem_eff_attention_fail_with_batch_size_geq_65536_error_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mem_eff_attention_large_seq_len_uniform_attention_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mem_efficient_fail_bfloat16_less_than_sm80_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_nested_fails_on_padding_head_dim_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_unaligned_tensors_cuda, test/test_transformers.py::TestSDPACUDA::test_scaled_dot_product_attention_math_with_negative_scale_kernel0_cuda, test/test_transformers.py::TestSDPACUDA::test_sdp_math_gradcheck_contiguous_inputs_False_cuda, test/test_transformers.py::TestSDPACUDA::test_sdp_math_gradcheck_contiguous_inputs_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_d256_heuristic_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_different_dk_dv_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_fail_d128_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_gqa_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_nonmodulo64seqlen_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_preserves_query_layout_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_trivial_output_transpose_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_different_dk_dv_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_backwards_throws_determinism_warning_fused_kernel0_warn_only_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_backwards_throws_determinism_warning_fused_kernel0_warn_only_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_backwards_throws_determinism_warning_fused_kernel1_warn_only_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_backwards_throws_determinism_warning_fused_kernel1_warn_only_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_query_dense_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_seq_len_1_inputs_fused_kernel0_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_seq_len_1_inputs_fused_kernel1_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_choice_type_dense_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_choice_type_nested_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_priority_order_use_compile_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_priority_order_use_compile_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_long_sequence_mask_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_long_sequence_mask_float32_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_non_contig_mask_bug_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_non_contiguous_mask_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_non_contiguous_mask_float32_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_backwards_determinism_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_1_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_2_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_3_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_4_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_cudnn_nested_type_nested_is_contiguous_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_accuracy_type_dense_fused_kernel0_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_accuracy_type_dense_fused_kernel1_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_accuracy_type_nested_fused_kernel0_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_accuracy_type_nested_fused_kernel1_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_dense_is_contiguous_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_dense_is_contiguous_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_nested_is_contiguous_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_nested_is_contiguous_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_choice_with_determinism_warn_only_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_choice_with_determinism_warn_only_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_False_bfloat16_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_False_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_True_bfloat16_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_True_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_False_bfloat16_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_False_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_True_bfloat16_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_True_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_False_is_causal_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_False_is_causal_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_True_is_causal_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_True_is_causal_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_singelton_head_dim_stride_ne_1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_1_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_1_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_1_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_1_shape3_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_2_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_2_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_2_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_2_shape3_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_1_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_1_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_1_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_1_shape3_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_2_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_2_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_2_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_2_shape3_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_and_mask_fails_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape3_cuda 2025-09-07T09:35:00.5795695Z 2025-09-07T09:35:00.5795888Z test_transformers.py::TestTransformersCUDA::test_bias_is_none_cuda PASSED [0.0353s] [ 0%] 2025-09-07T09:35:00.5796270Z test_transformers.py::TestTransformersCUDA::test_decoder_only_layer_cuda SKIPPED [0.0002s] (Fairseq not found) [ 0%] 2025-09-07T09:35:00.5796701Z test_transformers.py::TestTransformersCUDA::test_decoder_padding_and_src_mask_bool_cuda PASSED [0.2426s] [ 0%] 2025-09-07T09:35:00.5797032Z test_transformers.py::TestTransformersCUDA::test_disable_fastpath_cuda PASSED [0.4871s] [ 0%] 2025-09-07T09:35:00.5797343Z test_transformers.py::TestTransformersCUDA::test_encoder_is_causal_cuda PASSED [0.0425s] [ 0%] 2025-09-07T09:35:00.5797660Z test_transformers.py::TestTransformersCUDA::test_encoder_padding_and_src_mask_bool_cuda PASSED [0.1376s] [ 0%] 2025-09-07T09:35:00.5798067Z test_transformers.py::TestTransformersCUDA::test_is_causal_gpu_cuda SKIPPED [0.0007s] (skipIfRocm: test doesn't currently work on the ROCm stack) [ 0%] 2025-09-07T09:35:00.5798515Z test_transformers.py::TestTransformersCUDA::test_kpm_mask_trailing_column_with_nested_tensor_cuda PASSED [0.1375s] [ 0%] 2025-09-07T09:35:00.5798881Z test_transformers.py::TestTransformersCUDA::test_mask_check_fastpath_cuda PASSED [0.0106s] [ 0%] 2025-09-07T09:35:00.5799201Z test_transformers.py::TestTransformersCUDA::test_math_backend_high_precision_cuda PASSED [3.1201s] [ 0%] 2025-09-07T09:35:00.5799534Z test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_1_bias_False_cuda PASSED [0.0030s] [ 0%] 2025-09-07T09:35:00.5799895Z test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_1_bias_True_cuda PASSED [0.0023s] [ 0%] 2025-09-07T09:35:00.5800232Z test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_8_bias_False_cuda PASSED [0.0019s] [ 0%] 2025-09-07T09:35:00.5800685Z test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_8_bias_True_cuda PASSED [0.0020s] [ 0%] 2025-09-07T09:35:00.5801139Z test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim1_bool_cuda PASSED [1.9953s] [ 0%] 2025-09-07T09:35:00.5802000Z test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim1_float32_cuda SKIPPED [0.0005s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157060 for platform(s) rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5803197Z test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim_2_bool_cuda SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157038 for platform(s) rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5804027Z test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim_2_float32_cuda PASSED [0.0041s] [ 0%] 2025-09-07T09:35:00.5804553Z test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim1_bool_cuda SKIPPED [0.0006s] (boolean mask is not fully supported on ROCm yet.) [ 0%] 2025-09-07T09:35:00.5805091Z test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim1_float32_cuda PASSED [0.0031s] [ 0%] 2025-09-07T09:35:00.5805620Z test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim_2_bool_cuda SKIPPED [0.0005s] (boolean mask is not fully supported on ROCm yet.) [ 0%] 2025-09-07T09:35:00.5806283Z test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim_2_float32_cuda PASSED [0.0027s] [ 0%] 2025-09-07T09:35:00.5806964Z test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim1_bool_cuda SKIPPED [0.0005s] (boolean mask is not fully supported on ROCm yet.) [ 0%] 2025-09-07T09:35:00.5807477Z test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim1_float32_cuda PASSED [0.0027s] [ 0%] 2025-09-07T09:35:00.5808010Z test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim_2_bool_cuda SKIPPED [0.0005s] (boolean mask is not fully supported on ROCm yet.) [ 0%] 2025-09-07T09:35:00.5808548Z test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim_2_float32_cuda PASSED [0.0027s] [ 0%] 2025-09-07T09:35:00.5808994Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_attn_mask_dropout_p_0_0_cuda PASSED [0.0416s] [ 0%] 2025-09-07T09:35:00.5809431Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_attn_mask_dropout_p_0_2_cuda PASSED [0.0042s] [ 0%] 2025-09-07T09:35:00.5809845Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_attn_mask_dropout_p_0_5_cuda PASSED [0.0038s] [ 0%] 2025-09-07T09:35:00.5810702Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_causal_attn_mask_dropout_p_0_0_cuda SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157061 for platform(s) rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5811885Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_causal_attn_mask_dropout_p_0_2_cuda SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157039 for platform(s) rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5813038Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_2D_causal_attn_mask_dropout_p_0_5_cuda SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157091 for platform(s) rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5813842Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_attn_mask_dropout_p_0_0_cuda PASSED [0.0025s] [ 0%] 2025-09-07T09:35:00.5814258Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_attn_mask_dropout_p_0_2_cuda PASSED [0.0027s] [ 0%] 2025-09-07T09:35:00.5814670Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_attn_mask_dropout_p_0_5_cuda PASSED [0.0025s] [ 0%] 2025-09-07T09:35:00.5815467Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_causal_attn_mask_dropout_p_0_0_cuda SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157062 for platform(s) rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5816736Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_causal_attn_mask_dropout_p_0_2_cuda SKIPPED [0.0004s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157040 for platform(s) rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5817946Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_3D_causal_attn_mask_dropout_p_0_5_cuda SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157092 for platform(s) rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5819149Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_no_attn_mask_dropout_p_0_0_cuda SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/131086 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5820295Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_no_attn_mask_dropout_p_0_2_cuda SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/131146 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5821482Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_3D_input_dim_no_attn_mask_dropout_p_0_5_cuda SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/131123 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5822289Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_attn_mask_dropout_p_0_0_cuda PASSED [0.0035s] [ 0%] 2025-09-07T09:35:00.5822709Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_attn_mask_dropout_p_0_2_cuda PASSED [0.0033s] [ 0%] 2025-09-07T09:35:00.5823134Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_attn_mask_dropout_p_0_5_cuda PASSED [0.0033s] [ 0%] 2025-09-07T09:35:00.5823926Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_causal_attn_mask_dropout_p_0_0_cuda SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157063 for platform(s) rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5825110Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_causal_attn_mask_dropout_p_0_2_cuda SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157041 for platform(s) rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5826280Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_2D_causal_attn_mask_dropout_p_0_5_cuda SKIPPED [0.0002s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157093 for platform(s) rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5827174Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_attn_mask_dropout_p_0_0_cuda PASSED [0.0026s] [ 0%] 2025-09-07T09:35:00.5827605Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_attn_mask_dropout_p_0_2_cuda PASSED [0.0025s] [ 0%] 2025-09-07T09:35:00.5828017Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_attn_mask_dropout_p_0_5_cuda PASSED [0.0020s] [ 0%] 2025-09-07T09:35:00.5828816Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_causal_attn_mask_dropout_p_0_0_cuda SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157064 for platform(s) rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5829970Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_causal_attn_mask_dropout_p_0_2_cuda SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157042 for platform(s) rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5831144Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_4D_causal_attn_mask_dropout_p_0_5_cuda SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157094 for platform(s) rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5832352Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_no_attn_mask_dropout_p_0_0_cuda SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/129853 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5833540Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_no_attn_mask_dropout_p_0_2_cuda SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/131107 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5834706Z test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_4D_input_dim_no_attn_mask_dropout_p_0_5_cuda SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/131179 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5835438Z test_transformers.py::TestTransformersCUDA::test_script_encoder_subclass_cuda PASSED [0.3168s] [ 0%] 2025-09-07T09:35:00.5835764Z test_transformers.py::TestTransformersCUDA::test_script_mha_in_proj_weight_none_cuda PASSED [0.0098s] [ 0%] 2025-09-07T09:35:00.5836177Z test_transformers.py::TestTransformersCUDA::test_self_attn_TxT_attn_mask_cuda SKIPPED [0.0001s] (4D mask not supported yet - activate when 4D mask supported) [ 0%] 2025-09-07T09:35:00.5836663Z test_transformers.py::TestTransformersCUDA::test_train_with_is_causal_cuda PASSED [1.4333s] [ 0%] 2025-09-07T09:35:00.5837093Z test_transformers.py::TestTransformersCUDA::test_train_with_pad_and_catch_error_cuda SKIPPED [0.0010s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-09-07T09:35:00.5837503Z test_transformers.py::TestTransformersCUDA::test_transformer_bias_is_none_cuda PASSED [0.0161s] [ 0%] 2025-09-07T09:35:00.5837900Z test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_False_enable_nested_tensor_False_cuda PASSED [0.0169s] [ 0%] 2025-09-07T09:35:00.5838371Z test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_False_enable_nested_tensor_True_cuda PASSED [0.0167s] [ 0%] 2025-09-07T09:35:00.5838836Z test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_True_enable_nested_tensor_False_cuda PASSED [0.0187s] [ 0%] 2025-09-07T09:35:00.5839288Z test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_True_enable_nested_tensor_True_cuda PASSED [0.0186s] [ 0%] 2025-09-07T09:35:00.5839754Z test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_False_enable_nested_tensor_False_cuda PASSED [0.0152s] [ 0%] 2025-09-07T09:35:00.5840219Z test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_False_enable_nested_tensor_True_cuda PASSED [0.0267s] [ 0%] 2025-09-07T09:35:00.5840675Z test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_True_enable_nested_tensor_False_cuda PASSED [0.0193s] [ 0%] 2025-09-07T09:35:00.5841139Z test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_True_enable_nested_tensor_True_cuda PASSED [0.0195s] [ 0%] 2025-09-07T09:35:00.5841725Z test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_False_d_model_12_cuda SKIPPED [0.0009s] (skipIfRocm: test doesn't currently work on the ROCm stack) [ 0%] 2025-09-07T09:35:00.5842448Z test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_False_d_model_256_cuda SKIPPED [0.0006s] (skipIfRocm: test doesn't currently work on the ROCm stack) [ 0%] 2025-09-07T09:35:00.5843128Z test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_True_d_model_12_cuda SKIPPED [0.0005s] (skipIfRocm: test doesn't currently work on the ROCm stack) [ 0%] 2025-09-07T09:35:00.5843792Z test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_True_d_model_256_cuda SKIPPED [0.0004s] (skipIfRocm: test doesn't currently work on the ROCm stack) [ 0%] 2025-09-07T09:35:00.5844483Z test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_False_d_model_12_cuda SKIPPED [0.0004s] (skipIfRocm: test doesn't currently work on the ROCm stack) [ 0%] 2025-09-07T09:35:00.5845153Z test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_False_d_model_256_cuda SKIPPED [0.0005s] (skipIfRocm: test doesn't currently work on the ROCm stack) [ 0%] 2025-09-07T09:35:00.5845848Z test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_True_d_model_12_cuda SKIPPED [0.0004s] (skipIfRocm: test doesn't currently work on the ROCm stack) [ 0%] 2025-09-07T09:35:00.5846592Z test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_True_d_model_256_cuda SKIPPED [0.0004s] (skipIfRocm: test doesn't currently work on the ROCm stack) [ 0%] 2025-09-07T09:35:00.5847571Z test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_False_training_False_enable_nested_tensor_False_cuda SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157127 for platform(s) rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5848837Z test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_False_training_True_enable_nested_tensor_False_cuda SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157065 for platform(s) rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5850046Z test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_True_training_False_enable_nested_tensor_False_cuda SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157095 for platform(s) rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5851289Z test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_True_training_True_enable_nested_tensor_False_cuda SKIPPED [0.0003s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/157043 for platform(s) rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 0%] 2025-09-07T09:35:00.5852116Z test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_no_fastpath_with_hooks_nhead_3_cuda PASSED [0.0017s] [ 0%] 2025-09-07T09:35:00.5852524Z test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_no_fastpath_with_hooks_nhead_4_cuda PASSED [0.0014s] [ 0%] 2025-09-07T09:35:00.5852923Z test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_src_mask_nhead_1_cuda PASSED [0.0056s] [ 0%] 2025-09-07T09:35:00.5853295Z test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_src_mask_nhead_4_cuda PASSED [0.0019s] [ 0%] 2025-09-07T09:35:00.5853654Z test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_src_mask_nhead_8_cuda PASSED [0.1748s] [ 0%] 2025-09-07T09:35:00.5854034Z test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_subclass_cuda PASSED [0.1326s] [ 0%] 2025-09-07T09:35:00.5854398Z test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_subclass_model_cuda PASSED [0.5276s] [ 0%] 2025-09-07T09:35:00.5854775Z test_transformers.py::TestTransformersCUDA::test_with_nested_tensor_input_cuda PASSED [0.0980s] [ 0%] 2025-09-07T09:35:00.5855115Z test_transformers.py::TestSDPAFailureModesCUDA::test_dispatch_fails_no_backend_cuda PASSED [0.0030s] [ 0%] 2025-09-07T09:35:00.5855470Z test_transformers.py::TestSDPAFailureModesCUDA::test_flash_atteention_large_bf16_nan_values_cuda PASSED [0.0043s] [ 0%] 2025-09-07T09:35:00.5855863Z test_transformers.py::TestSDPAFailureModesCUDA::test_flash_attention_fail_with_non_square_causal_attention_cuda PASSED [0.0030s] [ 0%] 2025-09-07T09:35:00.5856239Z test_transformers.py::TestSDPAFailureModesCUDA::test_flash_autocast_fp32_bfloat16_cuda PASSED [0.0039s] [ 0%] 2025-09-07T09:35:00.5856643Z test_transformers.py::TestSDPAFailureModesCUDA::test_flash_autocast_fp32_float16_cuda PASSED [0.0035s] [ 0%] 2025-09-07T09:35:00.5857102Z test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_193_dropout_p_0_0_cuda SKIPPED [0.0001s] (Does not support fused SDPA or not SM86+ hardware) [ 0%] 2025-09-07T09:35:00.5857678Z test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_193_dropout_p_0_2_cuda SKIPPED [0.0001s] (Does not support fused SDPA or not SM86+ hardware) [ 0%] 2025-09-07T09:35:00.5858242Z test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_256_dropout_p_0_0_cuda SKIPPED [0.0001s] (Does not support fused SDPA or not SM86+ hardware) [ 0%] 2025-09-07T09:35:00.5858781Z test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_256_dropout_p_0_2_cuda SKIPPED [0.0001s] (Does not support fused SDPA or not SM86+ hardware) [ 0%] 2025-09-07T09:35:00.5859262Z test_transformers.py::TestSDPAFailureModesCUDA::test_flash_fail_fp32_cuda PASSED [0.0022s] [ 0%] 2025-09-07T09:35:00.5859634Z test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_nested_broadcasting_error_cases_cuda PASSED [0.0016s] [ 0%] 2025-09-07T09:35:00.5860041Z test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_nested_broadcasting_requires_grad_failure_cuda PASSED [0.0019s] [ 0%] 2025-09-07T09:35:00.5860438Z test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_seq_len_0_inputs_fused_kernel0_cuda PASSED [0.0014s] [ 0%] 2025-09-07T09:35:00.5860833Z test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_seq_len_0_inputs_fused_kernel1_cuda PASSED [0.0012s] [ 0%] 2025-09-07T09:35:00.5861211Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_attn_mask_present_kernel0_cuda PASSED [0.0006s] [ 0%] 2025-09-07T09:35:00.5861606Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_broadcast_kernel0_cuda PASSED [0.0006s] [ 0%] 2025-09-07T09:35:00.5861975Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_broadcast_kernel1_cuda PASSED [0.0006s] [ 0%] 2025-09-07T09:35:00.5862341Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_dim_3_kernel0_cuda PASSED [0.0011s] [ 0%] 2025-09-07T09:35:00.5862720Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_dim_3_kernel1_cuda PASSED [0.0010s] [ 0%] 2025-09-07T09:35:00.5863090Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_head_dim_kernel0_cuda PASSED [0.0006s] [ 0%] 2025-09-07T09:35:00.5863462Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_head_dim_kernel1_cuda PASSED [0.0005s] [ 0%] 2025-09-07T09:35:00.5863843Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_invalid_dtype_kernel0_cuda PASSED [0.0011s] [ 0%] 2025-09-07T09:35:00.5864223Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_invalid_dtype_kernel1_cuda PASSED [0.0006s] [ 0%] 2025-09-07T09:35:00.5864594Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_1_dimensional_inputs_kernel0_cuda PASSED [0.0010s] [ 0%] 2025-09-07T09:35:00.5864992Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_1_dimensional_inputs_kernel1_cuda PASSED [0.0005s] [ 0%] 2025-09-07T09:35:00.5865380Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_1_dimensional_inputs_kernel2_cuda PASSED [0.0011s] [ 0%] 2025-09-07T09:35:00.5865767Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_datatypes_kernel0_cuda PASSED [0.0005s] [ 0%] 2025-09-07T09:35:00.5866148Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_datatypes_kernel1_cuda PASSED [0.0005s] [ 0%] 2025-09-07T09:35:00.5866596Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_datatypes_kernel2_cuda PASSED [0.0006s] [ 0%] 2025-09-07T09:35:00.5866978Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_devices_kernel0_cuda PASSED [0.0005s] [ 0%] 2025-09-07T09:35:00.5867348Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_devices_kernel1_cuda PASSED [0.0005s] [ 0%] 2025-09-07T09:35:00.5867755Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_devices_kernel2_cuda PASSED [0.0005s] [ 0%] 2025-09-07T09:35:00.5868151Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_last_dim_stride_kernel0_cuda PASSED [0.0014s] [ 1%] 2025-09-07T09:35:00.5868489Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_last_dim_stride_kernel1_cuda PASSED [0.0011s] [ 1%] 2025-09-07T09:35:00.5868878Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_sdpa_kernel_grouped_query_attention_cuda_fused_kernel0_cuda PASSED [0.0011s] [ 1%] 2025-09-07T09:35:00.5869274Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_sequence_lengths_kernel0_cuda PASSED [0.0010s] [ 1%] 2025-09-07T09:35:00.5869620Z test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_sequence_lengths_kernel1_cuda PASSED [0.0010s] [ 1%] 2025-09-07T09:35:00.5870083Z test_transformers.py::TestSDPAFailureModesCUDA::test_mask_invalid_last_dim_stride_kernel0_cuda SKIPPED [0.0001s] (Efficient or cuDNN Attention was not built for this system) [ 1%] 2025-09-07T09:35:00.5870594Z test_transformers.py::TestSDPAFailureModesCUDA::test_mask_invalid_last_dim_stride_kernel1_cuda SKIPPED [0.0001s] (Efficient or cuDNN Attention was not built for this system) [ 1%] 2025-09-07T09:35:00.5871039Z test_transformers.py::TestSDPAFailureModesCUDA::test_mem_eff_attention_fail_with_batch_size_geq_65536_cuda PASSED [0.6762s] [ 1%] 2025-09-07T09:35:00.5871452Z test_transformers.py::TestSDPAFailureModesCUDA::test_mem_eff_attention_fail_with_batch_size_geq_65536_error_cuda PASSED [0.0019s] [ 1%] 2025-09-07T09:35:00.5871845Z test_transformers.py::TestSDPAFailureModesCUDA::test_mem_eff_attention_large_seq_len_uniform_attention_cuda PASSED [0.2994s] [ 1%] 2025-09-07T09:35:00.5872358Z test_transformers.py::TestSDPAFailureModesCUDA::test_mem_efficient_fail_bfloat16_less_than_sm80_cuda SKIPPED [0.0002s] (Current platform does not support fused SDPA or is an SM80+ device.) [ 1%] 2025-09-07T09:35:00.5872834Z test_transformers.py::TestSDPAFailureModesCUDA::test_nested_fails_on_padding_head_dim_cuda PASSED [0.0037s] [ 1%] 2025-09-07T09:35:00.5873170Z test_transformers.py::TestSDPAFailureModesCUDA::test_unaligned_tensors_cuda PASSED [0.0007s] [ 1%] 2025-09-07T09:35:00.5873524Z test_transformers.py::TestSDPACUDA::test_scaled_dot_product_attention_math_with_negative_scale_kernel0_cuda PASSED [0.0015s] [ 1%] 2025-09-07T09:35:00.5873875Z test_transformers.py::TestSDPACUDA::test_sdp_math_gradcheck_contiguous_inputs_False_cuda PASSED [0.0707s] [ 1%] 2025-09-07T09:35:00.5874202Z test_transformers.py::TestSDPACUDA::test_sdp_math_gradcheck_contiguous_inputs_True_cuda PASSED [0.0073s] [ 1%] 2025-09-07T09:35:00.5874622Z test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_d256_heuristic_cuda SKIPPED [0.0001s] (cuDNN Attention is not supported on this system) [ 1%] 2025-09-07T09:35:00.5875088Z test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_different_dk_dv_cuda SKIPPED [0.0001s] (cuDNN Attention is not supported on this system) [ 1%] 2025-09-07T09:35:00.5875517Z test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_fail_d128_cuda SKIPPED [0.0001s] (broken as of cuDNN 9.10) [ 1%] 2025-09-07T09:35:00.5875926Z test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_gqa_cuda SKIPPED [0.0001s] (cuDNN Attention is not supported on this system) [ 1%] 2025-09-07T09:35:00.5876384Z test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_nonmodulo64seqlen_cuda SKIPPED [0.0001s] (cudnn Attention is not supported on this system) [ 1%] 2025-09-07T09:35:00.5876941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_preserves_query_layout_cuda SKIPPED [0.0001s] (cudnn Attention is not supported on this system) [ 1%] 2025-09-07T09:35:00.5877426Z test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_trivial_output_transpose_cuda SKIPPED [0.0001s] (cudnn Attention is not supported on this system) [ 1%] 2025-09-07T09:35:00.5878011Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0145s] [ 1%] 2025-09-07T09:35:00.5878683Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0031s] [ 1%] 2025-09-07T09:35:00.5879311Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0036s] [ 1%] 2025-09-07T09:35:00.5879951Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0033s] [ 1%] 2025-09-07T09:35:00.5880596Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 1%] 2025-09-07T09:35:00.5881229Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 1%] 2025-09-07T09:35:00.5881877Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0035s] [ 1%] 2025-09-07T09:35:00.5882553Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0032s] [ 1%] 2025-09-07T09:35:00.5883194Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0127s] [ 1%] 2025-09-07T09:35:00.5883843Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0030s] [ 1%] 2025-09-07T09:35:00.5884461Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0035s] [ 1%] 2025-09-07T09:35:00.5885096Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0031s] [ 1%] 2025-09-07T09:35:00.5885726Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0031s] [ 1%] 2025-09-07T09:35:00.5886350Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0030s] [ 1%] 2025-09-07T09:35:00.5887067Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0035s] [ 1%] 2025-09-07T09:35:00.5887720Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0033s] [ 1%] 2025-09-07T09:35:00.5888525Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 [W907 09:32:05.964746426 attention.cpp:916] Warning: Dropout mask should only be used for testing purposes. (function operator()) 2025-09-07T09:35:00.5889055Z PASSED [0.0138s] [ 1%] 2025-09-07T09:35:00.5889442Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0032s] [ 1%] 2025-09-07T09:35:00.5890088Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0036s] [ 1%] 2025-09-07T09:35:00.5890721Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0035s] [ 1%] 2025-09-07T09:35:00.5891345Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0032s] [ 1%] 2025-09-07T09:35:00.5892021Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0032s] [ 1%] 2025-09-07T09:35:00.5892671Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0036s] [ 1%] 2025-09-07T09:35:00.5893306Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0036s] [ 1%] 2025-09-07T09:35:00.5893944Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0130s] [ 1%] 2025-09-07T09:35:00.5894570Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0032s] [ 1%] 2025-09-07T09:35:00.5895212Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0035s] [ 1%] 2025-09-07T09:35:00.5895842Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0035s] [ 1%] 2025-09-07T09:35:00.5922394Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0030s] [ 1%] 2025-09-07T09:35:00.5923173Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0030s] [ 1%] 2025-09-07T09:35:00.5923796Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0036s] [ 1%] 2025-09-07T09:35:00.5924388Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0035s] [ 1%] 2025-09-07T09:35:00.5924982Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 1%] 2025-09-07T09:35:00.5925588Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 1%] 2025-09-07T09:35:00.5926186Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 1%] 2025-09-07T09:35:00.5926892Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 1%] 2025-09-07T09:35:00.5927512Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 1%] 2025-09-07T09:35:00.5928195Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 1%] 2025-09-07T09:35:00.5928791Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 1%] 2025-09-07T09:35:00.5929383Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 1%] 2025-09-07T09:35:00.5929976Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 1%] 2025-09-07T09:35:00.5930589Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 1%] 2025-09-07T09:35:00.5931179Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 1%] 2025-09-07T09:35:00.5931773Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 1%] 2025-09-07T09:35:00.5932388Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 1%] 2025-09-07T09:35:00.5933027Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 1%] 2025-09-07T09:35:00.5933617Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0030s] [ 1%] 2025-09-07T09:35:00.5934212Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0031s] [ 1%] 2025-09-07T09:35:00.5934883Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5935636Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5936399Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5937267Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5938023Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5938802Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5939760Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5940512Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5941255Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5942032Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5942883Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5943626Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5944365Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5945111Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5945878Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5946700Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5947446Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5948196Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5948949Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5949695Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5950445Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5951232Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5951998Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5952745Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5953487Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5954227Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5954986Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5955758Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5956630Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5957449Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5958191Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5958939Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5959681Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5960448Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5961214Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5961963Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5962743Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5963492Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5964256Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5965019Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5965759Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5966565Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5967306Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5968043Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5968783Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5969554Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5970310Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5971079Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 1%] 2025-09-07T09:35:00.5971751Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0104s] [ 1%] 2025-09-07T09:35:00.5972345Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 1%] 2025-09-07T09:35:00.5972938Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 1%] 2025-09-07T09:35:00.5973546Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 2%] 2025-09-07T09:35:00.5974157Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 2%] 2025-09-07T09:35:00.5974748Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 2%] 2025-09-07T09:35:00.5975337Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 2%] 2025-09-07T09:35:00.5975930Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 2%] 2025-09-07T09:35:00.5976579Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0109s] [ 2%] 2025-09-07T09:35:00.5977165Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 2%] 2025-09-07T09:35:00.5977750Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0034s] [ 2%] 2025-09-07T09:35:00.5978361Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 2%] 2025-09-07T09:35:00.5979000Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 2%] 2025-09-07T09:35:00.5979586Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 2%] 2025-09-07T09:35:00.5980170Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 2%] 2025-09-07T09:35:00.5980758Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 2%] 2025-09-07T09:35:00.5981343Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0126s] [ 2%] 2025-09-07T09:35:00.5981939Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 2%] 2025-09-07T09:35:00.5982553Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 2%] 2025-09-07T09:35:00.5983158Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 2%] 2025-09-07T09:35:00.5983748Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 2%] 2025-09-07T09:35:00.5984341Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 2%] 2025-09-07T09:35:00.5984931Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 2%] 2025-09-07T09:35:00.5985520Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 2%] 2025-09-07T09:35:00.5986105Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0099s] [ 2%] 2025-09-07T09:35:00.5986753Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0029s] [ 2%] 2025-09-07T09:35:00.5987379Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 2%] 2025-09-07T09:35:00.5987989Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 2%] 2025-09-07T09:35:00.5988574Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 2%] 2025-09-07T09:35:00.5989163Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0029s] [ 2%] 2025-09-07T09:35:00.5989749Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 2%] 2025-09-07T09:35:00.5990333Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 2%] 2025-09-07T09:35:00.5990920Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 2%] 2025-09-07T09:35:00.5991537Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 2%] 2025-09-07T09:35:00.5992146Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 2%] 2025-09-07T09:35:00.5992736Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 2%] 2025-09-07T09:35:00.5993326Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 2%] 2025-09-07T09:35:00.5993921Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 2%] 2025-09-07T09:35:00.5994511Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 2%] 2025-09-07T09:35:00.5995100Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 2%] 2025-09-07T09:35:00.5995686Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 2%] 2025-09-07T09:35:00.5996294Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 2%] 2025-09-07T09:35:00.5997117Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 2%] 2025-09-07T09:35:00.5997699Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 2%] 2025-09-07T09:35:00.5998289Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 2%] 2025-09-07T09:35:00.5998878Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 2%] 2025-09-07T09:35:00.5999465Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 2%] 2025-09-07T09:35:00.6000080Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 2%] 2025-09-07T09:35:00.6000768Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6001512Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6002253Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6003011Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6003750Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6004490Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6005265Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6006028Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6006913Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6007645Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6008374Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6009137Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6009891Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6010630Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6011365Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6012099Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6012835Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6013578Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6014358Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6015114Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6015855Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6016767Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6017508Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6018283Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6019113Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6019849Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6020583Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6021315Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6022055Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6022792Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6023573Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6024324Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6025060Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6025800Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6026619Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6027397Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6028158Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6028899Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6029639Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6030381Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6031126Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6031864Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6032630Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6033379Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6034109Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6034849Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6035587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6036342Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 2%] 2025-09-07T09:35:00.6037085Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0107s] [ 2%] 2025-09-07T09:35:00.6037675Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 2%] 2025-09-07T09:35:00.6038262Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 2%] 2025-09-07T09:35:00.6038847Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 2%] 2025-09-07T09:35:00.6039432Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 2%] 2025-09-07T09:35:00.6040020Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 2%] 2025-09-07T09:35:00.6040605Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 2%] 2025-09-07T09:35:00.6041229Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 2%] 2025-09-07T09:35:00.6041828Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0038s] [ 2%] 2025-09-07T09:35:00.6042403Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 2%] 2025-09-07T09:35:00.6042977Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 2%] 2025-09-07T09:35:00.6043553Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 2%] 2025-09-07T09:35:00.6044133Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 2%] 2025-09-07T09:35:00.6044711Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 2%] 2025-09-07T09:35:00.6045310Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 2%] 2025-09-07T09:35:00.6045909Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 2%] 2025-09-07T09:35:00.6046568Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0101s] [ 2%] 2025-09-07T09:35:00.6047153Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 2%] 2025-09-07T09:35:00.6047735Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 2%] 2025-09-07T09:35:00.6048319Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 2%] 2025-09-07T09:35:00.6048908Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 2%] 2025-09-07T09:35:00.6049499Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 2%] 2025-09-07T09:35:00.6050125Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 2%] 2025-09-07T09:35:00.6050740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 2%] 2025-09-07T09:35:00.6051319Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0098s] [ 2%] 2025-09-07T09:35:00.6051897Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 2%] 2025-09-07T09:35:00.6052477Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 2%] 2025-09-07T09:35:00.6053053Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 2%] 2025-09-07T09:35:00.6053630Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 2%] 2025-09-07T09:35:00.6054253Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 2%] 2025-09-07T09:35:00.6054861Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 3%] 2025-09-07T09:35:00.6055440Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 3%] 2025-09-07T09:35:00.6056022Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 3%] 2025-09-07T09:35:00.6056675Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 3%] 2025-09-07T09:35:00.6057257Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 3%] 2025-09-07T09:35:00.6057837Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 3%] 2025-09-07T09:35:00.6058420Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 3%] 2025-09-07T09:35:00.6059150Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 3%] 2025-09-07T09:35:00.6059761Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 3%] 2025-09-07T09:35:00.6060344Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 3%] 2025-09-07T09:35:00.6060928Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 3%] 2025-09-07T09:35:00.6061509Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 3%] 2025-09-07T09:35:00.6062086Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 3%] 2025-09-07T09:35:00.6062660Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 3%] 2025-09-07T09:35:00.6063267Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 3%] 2025-09-07T09:35:00.6063887Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 3%] 2025-09-07T09:35:00.6064466Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 3%] 2025-09-07T09:35:00.6065049Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 3%] 2025-09-07T09:35:00.6065710Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6066448Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6067259Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6068018Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6068770Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6069506Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6070244Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6070979Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6071708Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6072460Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6073205Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6073926Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6074652Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6075384Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6076113Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6077046Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6077794Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6078531Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6079264Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6079998Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6080731Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6081491Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6082243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6082979Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6083711Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6084441Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6085170Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6085896Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6086734Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6087484Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6088214Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6088942Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6089673Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6090451Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6091200Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6091929Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6092661Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6093398Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6096473Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6097295Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6098083Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6098839Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6099621Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6100366Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6101094Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6101864Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6102616Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6103343Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6109499Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0051s] [ 3%] 2025-09-07T09:35:00.6110162Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 3%] 2025-09-07T09:35:00.6110771Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0039s] [ 3%] 2025-09-07T09:35:00.6111371Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0036s] [ 3%] 2025-09-07T09:35:00.6111977Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 3%] 2025-09-07T09:35:00.6112638Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 3%] 2025-09-07T09:35:00.6113280Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0037s] [ 3%] 2025-09-07T09:35:00.6113906Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0035s] [ 3%] 2025-09-07T09:35:00.6114528Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0034s] [ 3%] 2025-09-07T09:35:00.6115134Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0030s] [ 3%] 2025-09-07T09:35:00.6120635Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0037s] [ 3%] 2025-09-07T09:35:00.6121418Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0036s] [ 3%] 2025-09-07T09:35:00.6122094Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 3%] 2025-09-07T09:35:00.6122734Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0029s] [ 3%] 2025-09-07T09:35:00.6123338Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0038s] [ 3%] 2025-09-07T09:35:00.6123938Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0035s] [ 3%] 2025-09-07T09:35:00.6124545Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0049s] [ 3%] 2025-09-07T09:35:00.6125151Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0035s] [ 3%] 2025-09-07T09:35:00.6125754Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0047s] [ 3%] 2025-09-07T09:35:00.6126392Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0044s] [ 3%] 2025-09-07T09:35:00.6127131Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0034s] [ 3%] 2025-09-07T09:35:00.6129374Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0034s] [ 3%] 2025-09-07T09:35:00.6129991Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0045s] [ 3%] 2025-09-07T09:35:00.6130597Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0041s] [ 3%] 2025-09-07T09:35:00.6131190Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0047s] [ 3%] 2025-09-07T09:35:00.6132017Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0038s] [ 3%] 2025-09-07T09:35:00.6133046Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0041s] [ 3%] 2025-09-07T09:35:00.6136224Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0041s] [ 3%] 2025-09-07T09:35:00.6136953Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0033s] [ 3%] 2025-09-07T09:35:00.6137770Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0034s] [ 3%] 2025-09-07T09:35:00.6138610Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0047s] [ 3%] 2025-09-07T09:35:00.6139347Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0041s] [ 3%] 2025-09-07T09:35:00.6141970Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0031s] [ 3%] 2025-09-07T09:35:00.6142564Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0031s] [ 3%] 2025-09-07T09:35:00.6143214Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0040s] [ 3%] 2025-09-07T09:35:00.6143845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0038s] [ 3%] 2025-09-07T09:35:00.6144439Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0031s] [ 3%] 2025-09-07T09:35:00.6145038Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0032s] [ 3%] 2025-09-07T09:35:00.6145687Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0043s] [ 3%] 2025-09-07T09:35:00.6146285Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0038s] [ 3%] 2025-09-07T09:35:00.6146942Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0032s] [ 3%] 2025-09-07T09:35:00.6147565Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0032s] [ 3%] 2025-09-07T09:35:00.6184220Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0040s] [ 3%] 2025-09-07T09:35:00.6184820Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0038s] [ 3%] 2025-09-07T09:35:00.6185407Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0032s] [ 3%] 2025-09-07T09:35:00.6185996Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0032s] [ 3%] 2025-09-07T09:35:00.6186733Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0040s] [ 3%] 2025-09-07T09:35:00.6187321Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0037s] [ 3%] 2025-09-07T09:35:00.6187991Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6188800Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6189570Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6190314Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6192299Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6193051Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6193824Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6194586Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 3%] 2025-09-07T09:35:00.6195325Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6196061Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6196866Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6197598Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6198334Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6199102Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6199864Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6201660Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6202406Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6203150Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6203918Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6204673Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6205425Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6206173Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6206998Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6207740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6208479Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6210289Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6211054Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6211788Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6212527Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6213266Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6214036Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6214794Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6215531Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6216276Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6217087Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6218863Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6219702Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6220501Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6221269Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6222017Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6222764Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6223510Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6224271Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6225028Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6225766Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6226576Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6228359Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6229106Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6229774Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0037s] [ 4%] 2025-09-07T09:35:00.6230424Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 4%] 2025-09-07T09:35:00.6231036Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0035s] [ 4%] 2025-09-07T09:35:00.6231622Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0031s] [ 4%] 2025-09-07T09:35:00.6232215Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 4%] 2025-09-07T09:35:00.6232815Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 4%] 2025-09-07T09:35:00.6233401Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0036s] [ 4%] 2025-09-07T09:35:00.6233986Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 4%] 2025-09-07T09:35:00.6235625Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0087s] [ 4%] 2025-09-07T09:35:00.6236236Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 4%] 2025-09-07T09:35:00.6236887Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0035s] [ 4%] 2025-09-07T09:35:00.6237470Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 4%] 2025-09-07T09:35:00.6238059Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 4%] 2025-09-07T09:35:00.6238650Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 4%] 2025-09-07T09:35:00.6239237Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0035s] [ 4%] 2025-09-07T09:35:00.6239820Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 4%] 2025-09-07T09:35:00.6240446Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0039s] [ 4%] 2025-09-07T09:35:00.6241057Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 4%] 2025-09-07T09:35:00.6242690Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0040s] [ 4%] 2025-09-07T09:35:00.6243300Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0033s] [ 4%] 2025-09-07T09:35:00.6243897Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 4%] 2025-09-07T09:35:00.6244489Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 4%] 2025-09-07T09:35:00.6245112Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0040s] [ 4%] 2025-09-07T09:35:00.6245720Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0033s] [ 4%] 2025-09-07T09:35:00.6246306Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0033s] [ 4%] 2025-09-07T09:35:00.6246968Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 4%] 2025-09-07T09:35:00.6247552Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0040s] [ 4%] 2025-09-07T09:35:00.6248139Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0031s] [ 4%] 2025-09-07T09:35:00.6248728Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 4%] 2025-09-07T09:35:00.6250344Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0029s] [ 4%] 2025-09-07T09:35:00.6250975Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0039s] [ 4%] 2025-09-07T09:35:00.6251604Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0032s] [ 4%] 2025-09-07T09:35:00.6252198Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 4%] 2025-09-07T09:35:00.6252790Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 4%] 2025-09-07T09:35:00.6253386Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0040s] [ 4%] 2025-09-07T09:35:00.6253981Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0033s] [ 4%] 2025-09-07T09:35:00.6254579Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 4%] 2025-09-07T09:35:00.6255199Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 4%] 2025-09-07T09:35:00.6255823Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0041s] [ 4%] 2025-09-07T09:35:00.6257519Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0033s] [ 4%] 2025-09-07T09:35:00.6258111Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 4%] 2025-09-07T09:35:00.6258702Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 4%] 2025-09-07T09:35:00.6259343Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0039s] [ 4%] 2025-09-07T09:35:00.6259947Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0032s] [ 4%] 2025-09-07T09:35:00.6260531Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 4%] 2025-09-07T09:35:00.6261167Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 4%] 2025-09-07T09:35:00.6261778Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0040s] [ 4%] 2025-09-07T09:35:00.6262364Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0031s] [ 4%] 2025-09-07T09:35:00.6263034Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6263785Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6265547Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6266321Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6267148Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6267893Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6268638Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6269381Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6270118Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6270854Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6271625Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6272375Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6274148Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6274892Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6275631Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6276398Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6277227Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6277969Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6278710Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6279450Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6280195Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6280942Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6282758Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6283525Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6284268Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6285010Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6285754Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6286595Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6287351Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6288090Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6288824Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6289557Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6290300Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6292081Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6292861Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 4%] 2025-09-07T09:35:00.6293621Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6294365Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6295114Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6295860Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6296702Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6297462Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6298202Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6298939Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6300792Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6301529Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6302306Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6303067Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6303798Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6304463Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0051s] [ 5%] 2025-09-07T09:35:00.6305052Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 5%] 2025-09-07T09:35:00.6305636Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 5%] 2025-09-07T09:35:00.6306216Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 5%] 2025-09-07T09:35:00.6306920Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 5%] 2025-09-07T09:35:00.6308564Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 5%] 2025-09-07T09:35:00.6309152Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 5%] 2025-09-07T09:35:00.6309754Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 5%] 2025-09-07T09:35:00.6310340Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0040s] [ 5%] 2025-09-07T09:35:00.6310922Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 5%] 2025-09-07T09:35:00.6311496Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 5%] 2025-09-07T09:35:00.6312072Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 5%] 2025-09-07T09:35:00.6312686Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 5%] 2025-09-07T09:35:00.6313288Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 5%] 2025-09-07T09:35:00.6313866Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 5%] 2025-09-07T09:35:00.6314443Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 5%] 2025-09-07T09:35:00.6316055Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0042s] [ 5%] 2025-09-07T09:35:00.6316715Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 5%] 2025-09-07T09:35:00.6317296Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 5%] 2025-09-07T09:35:00.6317913Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 5%] 2025-09-07T09:35:00.6318522Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 5%] 2025-09-07T09:35:00.6319107Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 5%] 2025-09-07T09:35:00.6319693Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 5%] 2025-09-07T09:35:00.6320280Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 5%] 2025-09-07T09:35:00.6321252Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0002s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/158890 for platform(s) rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 5%] 2025-09-07T09:35:00.6322208Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0043s] [ 5%] 2025-09-07T09:35:00.6323885Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0034s] [ 5%] 2025-09-07T09:35:00.6324490Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 5%] 2025-09-07T09:35:00.6325069Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 5%] 2025-09-07T09:35:00.6325658Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 5%] 2025-09-07T09:35:00.6326240Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 5%] 2025-09-07T09:35:00.6326960Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 5%] 2025-09-07T09:35:00.6327576Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 5%] 2025-09-07T09:35:00.6328181Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 5%] 2025-09-07T09:35:00.6328768Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 5%] 2025-09-07T09:35:00.6329347Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 5%] 2025-09-07T09:35:00.6329931Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 5%] 2025-09-07T09:35:00.6331565Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 5%] 2025-09-07T09:35:00.6332156Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 5%] 2025-09-07T09:35:00.6332743Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0035s] [ 5%] 2025-09-07T09:35:00.6333328Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 5%] 2025-09-07T09:35:00.6333949Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 5%] 2025-09-07T09:35:00.6334559Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 5%] 2025-09-07T09:35:00.6335136Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 5%] 2025-09-07T09:35:00.6335719Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 5%] 2025-09-07T09:35:00.6336303Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 5%] 2025-09-07T09:35:00.6336957Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 5%] 2025-09-07T09:35:00.6338611Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 5%] 2025-09-07T09:35:00.6339367Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6340110Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6340846Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6341579Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6342317Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6343057Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6343840Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6344594Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6345326Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6347178Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6347911Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6348671Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6349420Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6350156Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6350884Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6351612Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6352345Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6353085Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6353861Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6354615Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6356439Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6357274Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6358015Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6358776Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6359528Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6360261Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6360992Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6361720Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6362452Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6363187Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6363953Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6364701Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6366466Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6367276Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6368018Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6368751Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6369515Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6370273Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6371012Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6371749Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6372483Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6373211Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6375411Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6376165Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6376955Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6377697Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6378431Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6379225Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 5%] 2025-09-07T09:35:00.6379950Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0043s] [ 5%] 2025-09-07T09:35:00.6380560Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 5%] 2025-09-07T09:35:00.6381143Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 5%] 2025-09-07T09:35:00.6381725Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 5%] 2025-09-07T09:35:00.6382314Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 5%] 2025-09-07T09:35:00.6383918Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 5%] 2025-09-07T09:35:00.6384505Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 5%] 2025-09-07T09:35:00.6385088Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 5%] 2025-09-07T09:35:00.6385697Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0037s] [ 5%] 2025-09-07T09:35:00.6386293Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 5%] 2025-09-07T09:35:00.6386936Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 5%] 2025-09-07T09:35:00.6387516Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 5%] 2025-09-07T09:35:00.6388106Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 5%] 2025-09-07T09:35:00.6388687Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 6%] 2025-09-07T09:35:00.6389264Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 6%] 2025-09-07T09:35:00.6390886Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 6%] 2025-09-07T09:35:00.6391502Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0039s] [ 6%] 2025-09-07T09:35:00.6392088Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 6%] 2025-09-07T09:35:00.6392676Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 6%] 2025-09-07T09:35:00.6393263Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 6%] 2025-09-07T09:35:00.6393854Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 6%] 2025-09-07T09:35:00.6394445Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 6%] 2025-09-07T09:35:00.6395033Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 6%] 2025-09-07T09:35:00.6395642Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 6%] 2025-09-07T09:35:00.6396242Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0039s] [ 6%] 2025-09-07T09:35:00.6396897Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 6%] 2025-09-07T09:35:00.6398475Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0030s] [ 6%] 2025-09-07T09:35:00.6399055Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 6%] 2025-09-07T09:35:00.6399635Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 6%] 2025-09-07T09:35:00.6400257Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 6%] 2025-09-07T09:35:00.6400854Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 6%] 2025-09-07T09:35:00.6401431Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 6%] 2025-09-07T09:35:00.6402012Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 6%] 2025-09-07T09:35:00.6402597Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 6%] 2025-09-07T09:35:00.6403183Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 6%] 2025-09-07T09:35:00.6403763Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 6%] 2025-09-07T09:35:00.6405310Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 6%] 2025-09-07T09:35:00.6405909Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 6%] 2025-09-07T09:35:00.6406610Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 6%] 2025-09-07T09:35:00.6407215Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 6%] 2025-09-07T09:35:00.6407795Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 6%] 2025-09-07T09:35:00.6408376Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 6%] 2025-09-07T09:35:00.6408954Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 6%] 2025-09-07T09:35:00.6409529Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 6%] 2025-09-07T09:35:00.6410135Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 6%] 2025-09-07T09:35:00.6410745Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 6%] 2025-09-07T09:35:00.6411327Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 6%] 2025-09-07T09:35:00.6412931Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 6%] 2025-09-07T09:35:00.6413597Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6414340Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6415076Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6415812Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6416687Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6417450Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6418190Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6418927Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6419724Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6421505Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6422261Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6423006Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6423735Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6424467Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6425198Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6425925Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6426782Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6427537Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6428270Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6430010Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6430751Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6431521Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6432279Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6433016Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6433750Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6434482Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6435210Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6435938Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6436770Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6437522Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6439247Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6439980Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6440717Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6441497Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6442256Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6442990Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6443724Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6444461Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6445199Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6445939Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6447790Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6448541Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6449274Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6450008Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6450740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6451524Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6452275Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6453030Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 6%] 2025-09-07T09:35:00.6453697Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0039s] [ 6%] 2025-09-07T09:35:00.6454286Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 6%] 2025-09-07T09:35:00.6455863Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 6%] 2025-09-07T09:35:00.6456450Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 6%] 2025-09-07T09:35:00.6457115Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 6%] 2025-09-07T09:35:00.6457734Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 6%] 2025-09-07T09:35:00.6458335Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 6%] 2025-09-07T09:35:00.6458916Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 6%] 2025-09-07T09:35:00.6459564Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 6%] 2025-09-07T09:35:00.6460145Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 6%] 2025-09-07T09:35:00.6460719Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 6%] 2025-09-07T09:35:00.6461291Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 6%] 2025-09-07T09:35:00.6461900Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 6%] 2025-09-07T09:35:00.6463506Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 6%] 2025-09-07T09:35:00.6464085Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 6%] 2025-09-07T09:35:00.6464667Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 6%] 2025-09-07T09:35:00.6465257Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0072s] [ 6%] 2025-09-07T09:35:00.6465847Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 6%] 2025-09-07T09:35:00.6466433Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 6%] 2025-09-07T09:35:00.6467084Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 6%] 2025-09-07T09:35:00.6467711Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 6%] 2025-09-07T09:35:00.6468323Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 6%] 2025-09-07T09:35:00.6468910Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 6%] 2025-09-07T09:35:00.6470481Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 6%] 2025-09-07T09:35:00.6471071Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 6%] 2025-09-07T09:35:00.6471657Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 6%] 2025-09-07T09:35:00.6472236Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 6%] 2025-09-07T09:35:00.6472848Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 6%] 2025-09-07T09:35:00.6473454Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 6%] 2025-09-07T09:35:00.6474034Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 6%] 2025-09-07T09:35:00.6474612Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 6%] 2025-09-07T09:35:00.6475191Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 6%] 2025-09-07T09:35:00.6475771Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 6%] 2025-09-07T09:35:00.6476354Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 6%] 2025-09-07T09:35:00.6478011Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 6%] 2025-09-07T09:35:00.6478633Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 6%] 2025-09-07T09:35:00.6479247Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 6%] 2025-09-07T09:35:00.6479838Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 6%] 2025-09-07T09:35:00.6480431Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 6%] 2025-09-07T09:35:00.6481017Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 6%] 2025-09-07T09:35:00.6481598Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 7%] 2025-09-07T09:35:00.6482176Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 7%] 2025-09-07T09:35:00.6482775Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 7%] 2025-09-07T09:35:00.6483371Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 7%] 2025-09-07T09:35:00.6484931Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 7%] 2025-09-07T09:35:00.6485518Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 7%] 2025-09-07T09:35:00.6486102Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 7%] 2025-09-07T09:35:00.6486770Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 7%] 2025-09-07T09:35:00.6487429Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6488205Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6488968Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6489703Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6490440Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6491182Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6492904Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6493670Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6494425Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6495155Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6495884Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6496690Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6497416Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6498171Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6498919Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6499690Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6500427Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6502176Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6502914Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6503688Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6504446Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6505182Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6505923Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6506740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6507473Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6508238Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6508990Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6510722Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6511465Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6512201Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6512928Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6513687Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6514439Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6515172Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6515910Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6516714Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6517450Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6519227Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6519999Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6520735Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6521471Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6522205Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6522936Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6523690Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6524438Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6525171Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6525899Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6526693Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6528352Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0049s] [ 7%] 2025-09-07T09:35:00.6528934Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 7%] 2025-09-07T09:35:00.6529570Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 7%] 2025-09-07T09:35:00.6530163Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 7%] 2025-09-07T09:35:00.6530739Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 7%] 2025-09-07T09:35:00.6531319Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 7%] 2025-09-07T09:35:00.6531899Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 7%] 2025-09-07T09:35:00.6532478Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 7%] 2025-09-07T09:35:00.6533055Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0030s] [ 7%] 2025-09-07T09:35:00.6533652Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 7%] 2025-09-07T09:35:00.6535225Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 7%] 2025-09-07T09:35:00.6535799Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 7%] 2025-09-07T09:35:00.6536370Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 7%] 2025-09-07T09:35:00.6537006Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 7%] 2025-09-07T09:35:00.6537579Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 7%] 2025-09-07T09:35:00.6538151Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 7%] 2025-09-07T09:35:00.6538727Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 7%] 2025-09-07T09:35:00.6539400Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 7%] 2025-09-07T09:35:00.6540000Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 7%] 2025-09-07T09:35:00.6540575Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 7%] 2025-09-07T09:35:00.6541153Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 7%] 2025-09-07T09:35:00.6542737Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 7%] 2025-09-07T09:35:00.6543321Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 7%] 2025-09-07T09:35:00.6543901Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 7%] 2025-09-07T09:35:00.6544503Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0042s] [ 7%] 2025-09-07T09:35:00.6545099Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 7%] 2025-09-07T09:35:00.6545669Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 7%] 2025-09-07T09:35:00.6546238Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 7%] 2025-09-07T09:35:00.6546862Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 7%] 2025-09-07T09:35:00.6547439Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 7%] 2025-09-07T09:35:00.6548013Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 7%] 2025-09-07T09:35:00.6549580Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 7%] 2025-09-07T09:35:00.6550205Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 7%] 2025-09-07T09:35:00.6550806Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 7%] 2025-09-07T09:35:00.6551387Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 7%] 2025-09-07T09:35:00.6551965Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 7%] 2025-09-07T09:35:00.6552543Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 7%] 2025-09-07T09:35:00.6553123Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 7%] 2025-09-07T09:35:00.6553700Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 7%] 2025-09-07T09:35:00.6554305Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 7%] 2025-09-07T09:35:00.6554902Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 7%] 2025-09-07T09:35:00.6555475Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 7%] 2025-09-07T09:35:00.6557148Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 7%] 2025-09-07T09:35:00.6557726Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 7%] 2025-09-07T09:35:00.6558304Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 7%] 2025-09-07T09:35:00.6558882Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 7%] 2025-09-07T09:35:00.6559457Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 7%] 2025-09-07T09:35:00.6560071Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 7%] 2025-09-07T09:35:00.6560747Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6561483Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6562215Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6562942Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6564685Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6565458Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6566214Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6567004Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6567732Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6568456Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6569179Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6569941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6570684Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6571412Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6573151Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6573880Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6574609Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6575369Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 7%] 2025-09-07T09:35:00.6576115Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6576917Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6577651Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6578385Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6579167Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6579896Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6580661Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6582415Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6583138Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6583858Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6584585Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6585338Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6586082Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6586878Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6587608Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6588340Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6589065Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6590814Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6591577Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6592334Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6593068Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6593800Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6594524Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6595269Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6596011Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6596788Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6597512Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6599250Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6599985Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6600725Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6601430Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0035s] [ 8%] 2025-09-07T09:35:00.6602041Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 8%] 2025-09-07T09:35:00.6602627Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 8%] 2025-09-07T09:35:00.6603214Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 8%] 2025-09-07T09:35:00.6603805Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 8%] 2025-09-07T09:35:00.6604392Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 8%] 2025-09-07T09:35:00.6604980Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 8%] 2025-09-07T09:35:00.6605606Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 8%] 2025-09-07T09:35:00.6607320Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 8%] 2025-09-07T09:35:00.6607900Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 8%] 2025-09-07T09:35:00.6608480Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 8%] 2025-09-07T09:35:00.6609060Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 8%] 2025-09-07T09:35:00.6609643Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 8%] 2025-09-07T09:35:00.6610227Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 8%] 2025-09-07T09:35:00.6610502Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 8%] 2025-09-07T09:35:00.6610821Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 8%] 2025-09-07T09:35:00.6611120Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 8%] 2025-09-07T09:35:00.6611399Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 8%] 2025-09-07T09:35:00.6611676Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 8%] 2025-09-07T09:35:00.6611955Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 8%] 2025-09-07T09:35:00.6612237Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 8%] 2025-09-07T09:35:00.6612514Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 8%] 2025-09-07T09:35:00.6613828Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 8%] 2025-09-07T09:35:00.6614127Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 8%] 2025-09-07T09:35:00.6614400Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 8%] 2025-09-07T09:35:00.6614672Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 8%] 2025-09-07T09:35:00.6614945Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0030s] [ 8%] 2025-09-07T09:35:00.6615220Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 8%] 2025-09-07T09:35:00.6615499Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 8%] 2025-09-07T09:35:00.6615776Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0029s] [ 8%] 2025-09-07T09:35:00.6616076Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 8%] 2025-09-07T09:35:00.6616366Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 8%] 2025-09-07T09:35:00.6616705Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 8%] 2025-09-07T09:35:00.6616991Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 8%] 2025-09-07T09:35:00.6617268Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 8%] 2025-09-07T09:35:00.6617544Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 8%] 2025-09-07T09:35:00.6617857Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 8%] 2025-09-07T09:35:00.6618159Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 8%] 2025-09-07T09:35:00.6618435Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 8%] 2025-09-07T09:35:00.6618709Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 8%] 2025-09-07T09:35:00.6619028Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 8%] 2025-09-07T09:35:00.6619302Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0029s] [ 8%] 2025-09-07T09:35:00.6619576Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 8%] 2025-09-07T09:35:00.6620844Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 8%] 2025-09-07T09:35:00.6621154Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 8%] 2025-09-07T09:35:00.6621455Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 8%] 2025-09-07T09:35:00.6621729Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 8%] 2025-09-07T09:35:00.6622003Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 8%] 2025-09-07T09:35:00.6622359Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6622713Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6623065Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6623435Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6623805Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6624156Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6624505Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6624854Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6625200Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6625567Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6625936Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6626282Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6626690Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6627039Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6627384Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6628743Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6629120Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6629470Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6629822Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6630174Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6630526Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6630903Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6631276Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6631627Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6631978Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6632329Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6632675Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6633036Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6633400Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6633749Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6634097Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6634446Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6634796Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6635161Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6635528Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6636914Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6637273Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6637629Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6637977Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6638364Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6638735Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6639083Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6639433Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6639778Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 8%] 2025-09-07T09:35:00.6640127Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6640496Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6640864Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6641216Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6641496Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0086s] [ 9%] 2025-09-07T09:35:00.6641775Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 9%] 2025-09-07T09:35:00.6642049Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 9%] 2025-09-07T09:35:00.6642342Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 9%] 2025-09-07T09:35:00.6642634Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 9%] 2025-09-07T09:35:00.6642913Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 9%] 2025-09-07T09:35:00.6643190Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 9%] 2025-09-07T09:35:00.6644437Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 9%] 2025-09-07T09:35:00.6644714Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 9%] 2025-09-07T09:35:00.6644988Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 9%] 2025-09-07T09:35:00.6645262Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 9%] 2025-09-07T09:35:00.6645561Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 9%] 2025-09-07T09:35:00.6645856Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0031s] [ 9%] 2025-09-07T09:35:00.6646128Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 9%] 2025-09-07T09:35:00.6646405Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 9%] 2025-09-07T09:35:00.6646754Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0036s] [ 9%] 2025-09-07T09:35:00.6647032Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0039s] [ 9%] 2025-09-07T09:35:00.6647308Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0033s] [ 9%] 2025-09-07T09:35:00.6647604Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0031s] [ 9%] 2025-09-07T09:35:00.6647898Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 9%] 2025-09-07T09:35:00.6648178Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 9%] 2025-09-07T09:35:00.6648455Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 9%] 2025-09-07T09:35:00.6648734Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 9%] 2025-09-07T09:35:00.6649011Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 9%] 2025-09-07T09:35:00.6649285Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 9%] 2025-09-07T09:35:00.6649557Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 9%] 2025-09-07T09:35:00.6649859Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 9%] 2025-09-07T09:35:00.6651125Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 9%] 2025-09-07T09:35:00.6651401Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 9%] 2025-09-07T09:35:00.6651676Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 9%] 2025-09-07T09:35:00.6651953Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 9%] 2025-09-07T09:35:00.6652230Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 9%] 2025-09-07T09:35:00.6652506Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 9%] 2025-09-07T09:35:00.6652807Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 9%] 2025-09-07T09:35:00.6653100Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 9%] 2025-09-07T09:35:00.6653377Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 9%] 2025-09-07T09:35:00.6653653Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 9%] 2025-09-07T09:35:00.6653931Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 9%] 2025-09-07T09:35:00.6654211Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 9%] 2025-09-07T09:35:00.6654486Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 9%] 2025-09-07T09:35:00.6654759Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 9%] 2025-09-07T09:35:00.6655049Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 9%] 2025-09-07T09:35:00.6655338Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 9%] 2025-09-07T09:35:00.6655609Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 9%] 2025-09-07T09:35:00.6655887Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 9%] 2025-09-07T09:35:00.6656165Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 9%] 2025-09-07T09:35:00.6656440Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 9%] 2025-09-07T09:35:00.6656776Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 9%] 2025-09-07T09:35:00.6658136Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6658510Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6658862Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6659261Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6659617Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6659972Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6660340Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6660707Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6661054Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6661408Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6661756Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6662103Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6662467Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6662836Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6663182Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6663528Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6663880Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6664231Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6664599Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6664964Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6666294Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6666708Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6667060Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6667408Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6667788Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6668156Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6668503Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6668849Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6669198Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6669545Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6669915Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6670280Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6670630Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6670979Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6671329Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6671677Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6672046Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6672416Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6672766Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6673117Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6674448Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6674796Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6675182Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6675547Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6675896Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6676247Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6676665Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6677012Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 9%] 2025-09-07T09:35:00.6677321Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 9%] 2025-09-07T09:35:00.6677615Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 9%] 2025-09-07T09:35:00.6677889Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 9%] 2025-09-07T09:35:00.6678160Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 9%] 2025-09-07T09:35:00.6678436Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 9%] 2025-09-07T09:35:00.6678710Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 9%] 2025-09-07T09:35:00.6678982Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 9%] 2025-09-07T09:35:00.6679254Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 9%] 2025-09-07T09:35:00.6679548Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0036s] [ 9%] 2025-09-07T09:35:00.6679837Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 9%] 2025-09-07T09:35:00.6680104Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 9%] 2025-09-07T09:35:00.6680374Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 9%] 2025-09-07T09:35:00.6681627Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 9%] 2025-09-07T09:35:00.6681903Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 9%] 2025-09-07T09:35:00.6682172Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 9%] 2025-09-07T09:35:00.6682468Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 9%] 2025-09-07T09:35:00.6682759Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0032s] [ 9%] 2025-09-07T09:35:00.6683035Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 9%] 2025-09-07T09:35:00.6683309Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 9%] 2025-09-07T09:35:00.6683583Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 9%] 2025-09-07T09:35:00.6683860Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 9%] 2025-09-07T09:35:00.6684135Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 9%] 2025-09-07T09:35:00.6684411Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 9%] 2025-09-07T09:35:00.6684698Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0035s] [ 10%] 2025-09-07T09:35:00.6684983Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0049s] [ 10%] 2025-09-07T09:35:00.6685256Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0032s] [ 10%] 2025-09-07T09:35:00.6685525Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 10%] 2025-09-07T09:35:00.6685797Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 10%] 2025-09-07T09:35:00.6686068Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 10%] 2025-09-07T09:35:00.6686339Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 10%] 2025-09-07T09:35:00.6686693Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 10%] 2025-09-07T09:35:00.6686982Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 10%] 2025-09-07T09:35:00.6688227Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 10%] 2025-09-07T09:35:00.6688506Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 10%] 2025-09-07T09:35:00.6688784Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 10%] 2025-09-07T09:35:00.6689056Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 10%] 2025-09-07T09:35:00.6689333Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 10%] 2025-09-07T09:35:00.6689610Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 10%] 2025-09-07T09:35:00.6689925Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 10%] 2025-09-07T09:35:00.6690217Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 10%] 2025-09-07T09:35:00.6690489Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 10%] 2025-09-07T09:35:00.6690760Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 10%] 2025-09-07T09:35:00.6691028Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 10%] 2025-09-07T09:35:00.6691299Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 10%] 2025-09-07T09:35:00.6691572Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0034s] [ 10%] 2025-09-07T09:35:00.6691864Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 10%] 2025-09-07T09:35:00.6692149Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 10%] 2025-09-07T09:35:00.6692418Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 10%] 2025-09-07T09:35:00.6692770Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6693120Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6693475Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6693821Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6694190Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6695520Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6695869Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6696221Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6696626Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6697004Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6697370Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6697710Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6698059Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6698411Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6698756Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6699150Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6699528Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6699897Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6700244Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6700592Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6700943Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6701306Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6701675Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6702025Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6702369Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6703698Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6704047Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6704392Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6704762Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6705130Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6705476Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6705823Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6706175Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6706611Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6706974Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6707323Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6707674Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6708025Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6708375Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6708726Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6709092Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6709454Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6709801Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6710147Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6710494Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6711841Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6712206Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6712554Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6712836Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0039s] [ 10%] 2025-09-07T09:35:00.6713118Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 10%] 2025-09-07T09:35:00.6713397Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0039s] [ 10%] 2025-09-07T09:35:00.6713672Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0047s] [ 10%] 2025-09-07T09:35:00.6713957Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 10%] 2025-09-07T09:35:00.6714255Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 10%] 2025-09-07T09:35:00.6714548Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0040s] [ 10%] 2025-09-07T09:35:00.6714824Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0047s] [ 10%] 2025-09-07T09:35:00.6715098Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0038s] [ 10%] 2025-09-07T09:35:00.6715376Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 10%] 2025-09-07T09:35:00.6715648Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0038s] [ 10%] 2025-09-07T09:35:00.6715921Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0045s] [ 10%] 2025-09-07T09:35:00.6716215Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 10%] 2025-09-07T09:35:00.6716579Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 10%] 2025-09-07T09:35:00.6716851Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0037s] [ 10%] 2025-09-07T09:35:00.6717124Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0045s] [ 10%] 2025-09-07T09:35:00.6717403Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0041s] [ 10%] 2025-09-07T09:35:00.6718656Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 10%] 2025-09-07T09:35:00.6718938Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0044s] [ 10%] 2025-09-07T09:35:00.6719216Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0052s] [ 10%] 2025-09-07T09:35:00.6719542Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 10%] 2025-09-07T09:35:00.6719841Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 10%] 2025-09-07T09:35:00.6720118Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0044s] [ 10%] 2025-09-07T09:35:00.6720395Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0052s] [ 10%] 2025-09-07T09:35:00.6720672Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0041s] [ 10%] 2025-09-07T09:35:00.6720948Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0029s] [ 10%] 2025-09-07T09:35:00.6721221Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0043s] [ 10%] 2025-09-07T09:35:00.6721524Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0053s] [ 10%] 2025-09-07T09:35:00.6721817Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 10%] 2025-09-07T09:35:00.6722093Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0029s] [ 10%] 2025-09-07T09:35:00.6722369Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0045s] [ 10%] 2025-09-07T09:35:00.6722645Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0051s] [ 10%] 2025-09-07T09:35:00.6722923Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 10%] 2025-09-07T09:35:00.6723199Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 10%] 2025-09-07T09:35:00.6723477Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0038s] [ 10%] 2025-09-07T09:35:00.6723769Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0052s] [ 10%] 2025-09-07T09:35:00.6724063Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 10%] 2025-09-07T09:35:00.6725313Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 10%] 2025-09-07T09:35:00.6725593Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0038s] [ 10%] 2025-09-07T09:35:00.6725874Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0052s] [ 10%] 2025-09-07T09:35:00.6726148Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 10%] 2025-09-07T09:35:00.6726441Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0029s] [ 10%] 2025-09-07T09:35:00.6726795Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0037s] [ 10%] 2025-09-07T09:35:00.6727073Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0051s] [ 10%] 2025-09-07T09:35:00.6727349Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 10%] 2025-09-07T09:35:00.6727625Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 10%] 2025-09-07T09:35:00.6727904Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0037s] [ 10%] 2025-09-07T09:35:00.6728179Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0051s] [ 10%] 2025-09-07T09:35:00.6728534Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 10%] 2025-09-07T09:35:00.6728924Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6729300Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6729654Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6730007Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6730362Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6730729Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6731099Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6731450Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6731802Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6733140Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6733491Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6733839Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6734215Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6734580Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6734931Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6735286Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6735638Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6736010Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6736383Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6736801Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6737158Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6737513Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6737865Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6738215Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6738596Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6739005Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6739356Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6739711Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6740063Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6741422Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6741794Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6742150Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6742505Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6742861Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6743212Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6743570Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6743943Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6744313Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6744665Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6745026Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6745379Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6745746Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6746109Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6746461Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6746879Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6747230Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6747581Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6747861Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0031s] [ 11%] 2025-09-07T09:35:00.6748173Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 11%] 2025-09-07T09:35:00.6749466Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0035s] [ 11%] 2025-09-07T09:35:00.6749744Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0039s] [ 11%] 2025-09-07T09:35:00.6750025Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 11%] 2025-09-07T09:35:00.6750306Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 11%] 2025-09-07T09:35:00.6750588Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0034s] [ 11%] 2025-09-07T09:35:00.6750865Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0038s] [ 11%] 2025-09-07T09:35:00.6751168Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 11%] 2025-09-07T09:35:00.6751462Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 11%] 2025-09-07T09:35:00.6751734Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0035s] [ 11%] 2025-09-07T09:35:00.6752007Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0038s] [ 11%] 2025-09-07T09:35:00.6752284Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 11%] 2025-09-07T09:35:00.6752561Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 11%] 2025-09-07T09:35:00.6752838Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0034s] [ 11%] 2025-09-07T09:35:00.6753112Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0037s] [ 11%] 2025-09-07T09:35:00.6753409Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 11%] 2025-09-07T09:35:00.6753699Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 11%] 2025-09-07T09:35:00.6753979Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0036s] [ 11%] 2025-09-07T09:35:00.6754257Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0040s] [ 11%] 2025-09-07T09:35:00.6754539Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 11%] 2025-09-07T09:35:00.6754823Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 11%] 2025-09-07T09:35:00.6756099Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0036s] [ 11%] 2025-09-07T09:35:00.6756404Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0041s] [ 11%] 2025-09-07T09:35:00.6756769Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 11%] 2025-09-07T09:35:00.6757044Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 11%] 2025-09-07T09:35:00.6757320Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0036s] [ 11%] 2025-09-07T09:35:00.6757596Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0040s] [ 11%] 2025-09-07T09:35:00.6757872Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 11%] 2025-09-07T09:35:00.6758145Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 11%] 2025-09-07T09:35:00.6758424Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0037s] [ 11%] 2025-09-07T09:35:00.6758724Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0039s] [ 11%] 2025-09-07T09:35:00.6759022Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 11%] 2025-09-07T09:35:00.6759300Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 11%] 2025-09-07T09:35:00.6759581Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0033s] [ 11%] 2025-09-07T09:35:00.6759859Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0040s] [ 11%] 2025-09-07T09:35:00.6760138Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 11%] 2025-09-07T09:35:00.6760419Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 11%] 2025-09-07T09:35:00.6760725Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0033s] [ 11%] 2025-09-07T09:35:00.6761019Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0041s] [ 11%] 2025-09-07T09:35:00.6761294Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 11%] 2025-09-07T09:35:00.6761572Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 11%] 2025-09-07T09:35:00.6762834Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0032s] [ 11%] 2025-09-07T09:35:00.6763111Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0040s] [ 11%] 2025-09-07T09:35:00.6763387Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 11%] 2025-09-07T09:35:00.6763663Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 11%] 2025-09-07T09:35:00.6763965Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0033s] [ 11%] 2025-09-07T09:35:00.6764258Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0041s] [ 11%] 2025-09-07T09:35:00.6764613Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6764968Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6765323Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6765691Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6766067Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6766420Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6766853Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6767207Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6767557Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6767907Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6768278Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6768643Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6768992Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6769342Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6770703Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6771076Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6771451Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6771808Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6772161Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6772513Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6772868Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6773224Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6773595Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6773969Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6774321Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6774674Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6775023Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6775387Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 11%] 2025-09-07T09:35:00.6775753Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6776111Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6776463Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6776875Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6777228Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6777582Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6779030Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6779406Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6779770Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6780127Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6780482Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6780855Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6781225Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6781582Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6781935Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6782289Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6782640Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6782991Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6783356Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6783718Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6784000Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0053s] [ 12%] 2025-09-07T09:35:00.6784279Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 12%] 2025-09-07T09:35:00.6784554Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 12%] 2025-09-07T09:35:00.6784827Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 12%] 2025-09-07T09:35:00.6785123Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 12%] 2025-09-07T09:35:00.6785412Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 12%] 2025-09-07T09:35:00.6785686Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 12%] 2025-09-07T09:35:00.6787010Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 12%] 2025-09-07T09:35:00.6787287Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0040s] [ 12%] 2025-09-07T09:35:00.6787563Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 12%] 2025-09-07T09:35:00.6787836Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 12%] 2025-09-07T09:35:00.6788103Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 12%] 2025-09-07T09:35:00.6788410Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 12%] 2025-09-07T09:35:00.6788701Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 12%] 2025-09-07T09:35:00.6788971Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 12%] 2025-09-07T09:35:00.6789242Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 12%] 2025-09-07T09:35:00.6789520Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0033s] [ 12%] 2025-09-07T09:35:00.6789798Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 12%] 2025-09-07T09:35:00.6790069Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 12%] 2025-09-07T09:35:00.6790361Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 12%] 2025-09-07T09:35:00.6790656Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 12%] 2025-09-07T09:35:00.6790934Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 12%] 2025-09-07T09:35:00.6791209Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 12%] 2025-09-07T09:35:00.6791486Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 12%] 2025-09-07T09:35:00.6791761Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0037s] [ 12%] 2025-09-07T09:35:00.6792035Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 12%] 2025-09-07T09:35:00.6792308Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 12%] 2025-09-07T09:35:00.6793590Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 12%] 2025-09-07T09:35:00.6793886Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 12%] 2025-09-07T09:35:00.6794158Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 12%] 2025-09-07T09:35:00.6794431Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 12%] 2025-09-07T09:35:00.6794706Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 12%] 2025-09-07T09:35:00.6794983Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 12%] 2025-09-07T09:35:00.6795258Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 12%] 2025-09-07T09:35:00.6795551Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 12%] 2025-09-07T09:35:00.6795841Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 12%] 2025-09-07T09:35:00.6796119Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 12%] 2025-09-07T09:35:00.6796396Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 12%] 2025-09-07T09:35:00.6796745Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 12%] 2025-09-07T09:35:00.6797023Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 12%] 2025-09-07T09:35:00.6797296Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 12%] 2025-09-07T09:35:00.6797568Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0031s] [ 12%] 2025-09-07T09:35:00.6797870Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 12%] 2025-09-07T09:35:00.6798160Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 12%] 2025-09-07T09:35:00.6798439Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 12%] 2025-09-07T09:35:00.6798712Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 12%] 2025-09-07T09:35:00.6798987Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 12%] 2025-09-07T09:35:00.6800244Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 12%] 2025-09-07T09:35:00.6800601Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6800983Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6801354Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6801701Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6802056Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6802406Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6802758Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6803129Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6803495Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6803842Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6804191Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6804537Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6804885Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6805258Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6805619Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6805964Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6806317Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6806733Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6807081Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6808498Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6808887Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6809241Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6809594Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6809944Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6810292Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6810683Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6811054Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6811398Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6811750Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6812104Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6812449Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6812814Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6813182Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6813532Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6813882Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6814229Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6814580Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6814945Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6815311Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6816724Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6817078Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6817431Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6817777Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6818156Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6818530Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6818881Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6819283Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6819632Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 12%] 2025-09-07T09:35:00.6819912Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0056s] [ 12%] 2025-09-07T09:35:00.6820240Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0038s] [ 12%] 2025-09-07T09:35:00.6820544Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0081s] [ 12%] 2025-09-07T09:35:00.6820822Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0069s] [ 12%] 2025-09-07T09:35:00.6821105Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0038s] [ 12%] 2025-09-07T09:35:00.6821387Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0038s] [ 12%] 2025-09-07T09:35:00.6821664Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0080s] [ 13%] 2025-09-07T09:35:00.6821941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0069s] [ 13%] 2025-09-07T09:35:00.6822217Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0049s] [ 13%] 2025-09-07T09:35:00.6822510Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0038s] [ 13%] 2025-09-07T09:35:00.6822798Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0081s] [ 13%] 2025-09-07T09:35:00.6823072Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0069s] [ 13%] 2025-09-07T09:35:00.6824340Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0038s] [ 13%] 2025-09-07T09:35:00.6824619Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0038s] [ 13%] 2025-09-07T09:35:00.6824898Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0081s] [ 13%] 2025-09-07T09:35:00.6825173Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0068s] [ 13%] 2025-09-07T09:35:00.6825472Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0050s] [ 13%] 2025-09-07T09:35:00.6825783Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0044s] [ 13%] 2025-09-07T09:35:00.6826061Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0102s] [ 13%] 2025-09-07T09:35:00.6826339Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0087s] [ 13%] 2025-09-07T09:35:00.6826698Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0045s] [ 13%] 2025-09-07T09:35:00.6826981Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0045s] [ 13%] 2025-09-07T09:35:00.6827259Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0099s] [ 13%] 2025-09-07T09:35:00.6827537Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0082s] [ 13%] 2025-09-07T09:35:00.6827844Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0049s] [ 13%] 2025-09-07T09:35:00.6828135Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0044s] [ 13%] 2025-09-07T09:35:00.6828411Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0097s] [ 13%] 2025-09-07T09:35:00.6828687Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0081s] [ 13%] 2025-09-07T09:35:00.6828966Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0044s] [ 13%] 2025-09-07T09:35:00.6829242Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0044s] [ 13%] 2025-09-07T09:35:00.6829516Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0102s] [ 13%] 2025-09-07T09:35:00.6829817Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0080s] [ 13%] 2025-09-07T09:35:00.6831169Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0043s] [ 13%] 2025-09-07T09:35:00.6831449Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0043s] [ 13%] 2025-09-07T09:35:00.6831727Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0095s] [ 13%] 2025-09-07T09:35:00.6832009Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0081s] [ 13%] 2025-09-07T09:35:00.6832294Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0043s] [ 13%] 2025-09-07T09:35:00.6832574Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0043s] [ 13%] 2025-09-07T09:35:00.6832874Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0097s] [ 13%] 2025-09-07T09:35:00.6833168Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0081s] [ 13%] 2025-09-07T09:35:00.6833444Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0043s] [ 13%] 2025-09-07T09:35:00.6833720Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0043s] [ 13%] 2025-09-07T09:35:00.6833997Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0095s] [ 13%] 2025-09-07T09:35:00.6834278Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0083s] [ 13%] 2025-09-07T09:35:00.6834553Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0043s] [ 13%] 2025-09-07T09:35:00.6834850Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0042s] [ 13%] 2025-09-07T09:35:00.6835143Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0097s] [ 13%] 2025-09-07T09:35:00.6835419Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0081s] [ 13%] 2025-09-07T09:35:00.6835695Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0136s] [ 13%] 2025-09-07T09:35:00.6835972Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0038s] [ 13%] 2025-09-07T09:35:00.6836248Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0081s] [ 13%] 2025-09-07T09:35:00.6836600Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0063s] [ 13%] 2025-09-07T09:35:00.6837860Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0038s] [ 13%] 2025-09-07T09:35:00.6838184Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0038s] [ 13%] 2025-09-07T09:35:00.6838490Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0079s] [ 13%] 2025-09-07T09:35:00.6838765Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0063s] [ 13%] 2025-09-07T09:35:00.6839039Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0127s] [ 13%] 2025-09-07T09:35:00.6839318Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0038s] [ 13%] 2025-09-07T09:35:00.6839592Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0082s] [ 13%] 2025-09-07T09:35:00.6839863Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0063s] [ 13%] 2025-09-07T09:35:00.6840161Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0038s] [ 13%] 2025-09-07T09:35:00.6840456Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0038s] [ 13%] 2025-09-07T09:35:00.6840729Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0080s] [ 13%] 2025-09-07T09:35:00.6841006Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0063s] [ 13%] 2025-09-07T09:35:00.6841283Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0137s] [ 13%] 2025-09-07T09:35:00.6841560Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0044s] [ 13%] 2025-09-07T09:35:00.6841835Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0096s] [ 13%] 2025-09-07T09:35:00.6842111Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0075s] [ 13%] 2025-09-07T09:35:00.6842413Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0043s] [ 13%] 2025-09-07T09:35:00.6842705Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0043s] [ 13%] 2025-09-07T09:35:00.6842981Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0096s] [ 13%] 2025-09-07T09:35:00.6843258Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0074s] [ 13%] 2025-09-07T09:35:00.6843533Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0126s] [ 13%] 2025-09-07T09:35:00.6844781Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0043s] [ 13%] 2025-09-07T09:35:00.6845057Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0096s] [ 13%] 2025-09-07T09:35:00.6845366Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0075s] [ 13%] 2025-09-07T09:35:00.6845663Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0043s] [ 13%] 2025-09-07T09:35:00.6845940Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0043s] [ 13%] 2025-09-07T09:35:00.6846215Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0096s] [ 13%] 2025-09-07T09:35:00.6846556Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0074s] [ 13%] 2025-09-07T09:35:00.6846834Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0043s] [ 13%] 2025-09-07T09:35:00.6847108Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0043s] [ 13%] 2025-09-07T09:35:00.6847386Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0096s] [ 13%] 2025-09-07T09:35:00.6847688Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0074s] [ 13%] 2025-09-07T09:35:00.6847994Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0043s] [ 13%] 2025-09-07T09:35:00.6848271Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0043s] [ 13%] 2025-09-07T09:35:00.6848549Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0096s] [ 13%] 2025-09-07T09:35:00.6848825Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0075s] [ 13%] 2025-09-07T09:35:00.6849097Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0043s] [ 13%] 2025-09-07T09:35:00.6849370Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0043s] [ 13%] 2025-09-07T09:35:00.6849678Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0096s] [ 13%] 2025-09-07T09:35:00.6849973Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0075s] [ 13%] 2025-09-07T09:35:00.6850252Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0043s] [ 13%] 2025-09-07T09:35:00.6851510Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0043s] [ 13%] 2025-09-07T09:35:00.6851787Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0095s] [ 13%] 2025-09-07T09:35:00.6852063Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0073s] [ 13%] 2025-09-07T09:35:00.6852341Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0034s] [ 13%] 2025-09-07T09:35:00.6852620Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0034s] [ 13%] 2025-09-07T09:35:00.6852920Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0082s] [ 13%] 2025-09-07T09:35:00.6853217Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0061s] [ 13%] 2025-09-07T09:35:00.6853497Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0034s] [ 13%] 2025-09-07T09:35:00.6853778Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0034s] [ 13%] 2025-09-07T09:35:00.6854056Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0080s] [ 13%] 2025-09-07T09:35:00.6854334Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0060s] [ 13%] 2025-09-07T09:35:00.6854629Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0038s] [ 13%] 2025-09-07T09:35:00.6854918Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0033s] [ 13%] 2025-09-07T09:35:00.6855194Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0083s] [ 13%] 2025-09-07T09:35:00.6855468Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0063s] [ 13%] 2025-09-07T09:35:00.6855746Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0034s] [ 13%] 2025-09-07T09:35:00.6856024Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0033s] [ 13%] 2025-09-07T09:35:00.6856302Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0082s] [ 13%] 2025-09-07T09:35:00.6856642Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0062s] [ 13%] 2025-09-07T09:35:00.6856955Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0039s] [ 13%] 2025-09-07T09:35:00.6857255Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0039s] [ 13%] 2025-09-07T09:35:00.6858517Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0097s] [ 13%] 2025-09-07T09:35:00.6858795Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0071s] [ 13%] 2025-09-07T09:35:00.6859132Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0039s] [ 13%] 2025-09-07T09:35:00.6859417Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0039s] [ 13%] 2025-09-07T09:35:00.6859695Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0097s] [ 13%] 2025-09-07T09:35:00.6859998Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0072s] [ 13%] 2025-09-07T09:35:00.6860292Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0040s] [ 13%] 2025-09-07T09:35:00.6860571Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0039s] [ 13%] 2025-09-07T09:35:00.6860845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0098s] [ 13%] 2025-09-07T09:35:00.6861122Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0072s] [ 13%] 2025-09-07T09:35:00.6861402Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0039s] [ 13%] 2025-09-07T09:35:00.6861677Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0039s] [ 13%] 2025-09-07T09:35:00.6861952Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0098s] [ 13%] 2025-09-07T09:35:00.6862241Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0072s] [ 13%] 2025-09-07T09:35:00.6862531Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0040s] [ 13%] 2025-09-07T09:35:00.6862809Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0039s] [ 14%] 2025-09-07T09:35:00.6863087Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0097s] [ 14%] 2025-09-07T09:35:00.6863366Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0072s] [ 14%] 2025-09-07T09:35:00.6863651Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0039s] [ 14%] 2025-09-07T09:35:00.6863932Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0039s] [ 14%] 2025-09-07T09:35:00.6865210Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0097s] [ 14%] 2025-09-07T09:35:00.6865506Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0072s] [ 14%] 2025-09-07T09:35:00.6865783Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0039s] [ 14%] 2025-09-07T09:35:00.6866059Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0039s] [ 14%] 2025-09-07T09:35:00.6866336Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0096s] [ 14%] 2025-09-07T09:35:00.6866668Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0072s] [ 14%] 2025-09-07T09:35:00.6866944Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0039s] [ 14%] 2025-09-07T09:35:00.6867220Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0039s] [ 14%] 2025-09-07T09:35:00.6867531Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0097s] [ 14%] 2025-09-07T09:35:00.6867826Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0073s] [ 14%] 2025-09-07T09:35:00.6868102Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0093s] [ 14%] 2025-09-07T09:35:00.6868383Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0033s] [ 14%] 2025-09-07T09:35:00.6868661Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0079s] [ 14%] 2025-09-07T09:35:00.6868938Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0060s] [ 14%] 2025-09-07T09:35:00.6869216Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0033s] [ 14%] 2025-09-07T09:35:00.6869511Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0033s] [ 14%] 2025-09-07T09:35:00.6869805Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0079s] [ 14%] 2025-09-07T09:35:00.6870079Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0059s] [ 14%] 2025-09-07T09:35:00.6870353Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0088s] [ 14%] 2025-09-07T09:35:00.6870627Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0033s] [ 14%] 2025-09-07T09:35:00.6871885Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0080s] [ 14%] 2025-09-07T09:35:00.6872159Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0060s] [ 14%] 2025-09-07T09:35:00.6872436Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0033s] [ 14%] 2025-09-07T09:35:00.6872734Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0033s] [ 14%] 2025-09-07T09:35:00.6873028Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0079s] [ 14%] 2025-09-07T09:35:00.6873299Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0059s] [ 14%] 2025-09-07T09:35:00.6873581Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0088s] [ 14%] 2025-09-07T09:35:00.6873859Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0039s] [ 14%] 2025-09-07T09:35:00.6874134Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0094s] [ 14%] 2025-09-07T09:35:00.6874430Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0071s] [ 14%] 2025-09-07T09:35:00.6874723Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0038s] [ 14%] 2025-09-07T09:35:00.6875001Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0038s] [ 14%] 2025-09-07T09:35:00.6875277Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0094s] [ 14%] 2025-09-07T09:35:00.6875552Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0071s] [ 14%] 2025-09-07T09:35:00.6875829Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0085s] [ 14%] 2025-09-07T09:35:00.6876105Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0038s] [ 14%] 2025-09-07T09:35:00.6876379Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0096s] [ 14%] 2025-09-07T09:35:00.6876750Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0071s] [ 14%] 2025-09-07T09:35:00.6877042Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0038s] [ 14%] 2025-09-07T09:35:00.6877316Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0038s] [ 14%] 2025-09-07T09:35:00.6877592Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0093s] [ 14%] 2025-09-07T09:35:00.6878876Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0070s] [ 14%] 2025-09-07T09:35:00.6879155Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0038s] [ 14%] 2025-09-07T09:35:00.6879434Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0038s] [ 14%] 2025-09-07T09:35:00.6879745Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0094s] [ 14%] 2025-09-07T09:35:00.6880042Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0070s] [ 14%] 2025-09-07T09:35:00.6880320Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0037s] [ 14%] 2025-09-07T09:35:00.6880599Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0038s] [ 14%] 2025-09-07T09:35:00.6880875Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0093s] [ 14%] 2025-09-07T09:35:00.6881153Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0071s] [ 14%] 2025-09-07T09:35:00.6881427Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0037s] [ 14%] 2025-09-07T09:35:00.6881699Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0038s] [ 14%] 2025-09-07T09:35:00.6882010Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0094s] [ 14%] 2025-09-07T09:35:00.6882366Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0071s] [ 14%] 2025-09-07T09:35:00.6882640Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0037s] [ 14%] 2025-09-07T09:35:00.6882940Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0038s] [ 14%] 2025-09-07T09:35:00.6883232Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0094s] [ 14%] 2025-09-07T09:35:00.6883509Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0070s] [ 14%] 2025-09-07T09:35:00.6883802Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0035s] [ 14%] 2025-09-07T09:35:00.6884122Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 14%] 2025-09-07T09:35:00.6884436Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0042s] [ 14%] 2025-09-07T09:35:00.6885699Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0036s] [ 14%] 2025-09-07T09:35:00.6885977Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 14%] 2025-09-07T09:35:00.6886255Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 14%] 2025-09-07T09:35:00.6886596Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0043s] [ 14%] 2025-09-07T09:35:00.6886871Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0036s] [ 14%] 2025-09-07T09:35:00.6887144Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0037s] [ 14%] 2025-09-07T09:35:00.6887455Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 14%] 2025-09-07T09:35:00.6887751Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0042s] [ 14%] 2025-09-07T09:35:00.6888025Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0036s] [ 14%] 2025-09-07T09:35:00.6888299Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 14%] 2025-09-07T09:35:00.6888574Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 14%] 2025-09-07T09:35:00.6888845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0042s] [ 14%] 2025-09-07T09:35:00.6889117Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0036s] [ 14%] 2025-09-07T09:35:00.6889423Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0038s] [ 14%] 2025-09-07T09:35:00.6889739Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 14%] 2025-09-07T09:35:00.6890014Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0058s] [ 14%] 2025-09-07T09:35:00.6890290Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0047s] [ 14%] 2025-09-07T09:35:00.6890569Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 14%] 2025-09-07T09:35:00.6890846Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 14%] 2025-09-07T09:35:00.6891129Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0058s] [ 14%] 2025-09-07T09:35:00.6892417Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0048s] [ 14%] 2025-09-07T09:35:00.6892714Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0034s] [ 14%] 2025-09-07T09:35:00.6893002Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 14%] 2025-09-07T09:35:00.6893277Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0058s] [ 14%] 2025-09-07T09:35:00.6893551Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0046s] [ 14%] 2025-09-07T09:35:00.6893827Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 14%] 2025-09-07T09:35:00.6894103Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 14%] 2025-09-07T09:35:00.6894375Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0059s] [ 14%] 2025-09-07T09:35:00.6894665Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0047s] [ 14%] 2025-09-07T09:35:00.6894961Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 14%] 2025-09-07T09:35:00.6895236Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 14%] 2025-09-07T09:35:00.6895513Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0058s] [ 14%] 2025-09-07T09:35:00.6895788Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0048s] [ 14%] 2025-09-07T09:35:00.6896066Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 14%] 2025-09-07T09:35:00.6896342Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 14%] 2025-09-07T09:35:00.6896670Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0058s] [ 14%] 2025-09-07T09:35:00.6896971Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0048s] [ 14%] 2025-09-07T09:35:00.6897258Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 14%] 2025-09-07T09:35:00.6897531Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 14%] 2025-09-07T09:35:00.6897805Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0058s] [ 14%] 2025-09-07T09:35:00.6898078Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0052s] [ 14%] 2025-09-07T09:35:00.6900769Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 14%] 2025-09-07T09:35:00.6901097Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 14%] 2025-09-07T09:35:00.6901394Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0058s] [ 14%] 2025-09-07T09:35:00.6901668Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0047s] [ 14%] 2025-09-07T09:35:00.6901941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0109s] [ 14%] 2025-09-07T09:35:00.6902215Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 14%] 2025-09-07T09:35:00.6902494Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0046s] [ 14%] 2025-09-07T09:35:00.6902774Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0039s] [ 14%] 2025-09-07T09:35:00.6903048Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 14%] 2025-09-07T09:35:00.6903342Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 14%] 2025-09-07T09:35:00.6903632Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0046s] [ 14%] 2025-09-07T09:35:00.6903905Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0038s] [ 14%] 2025-09-07T09:35:00.6904176Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0106s] [ 14%] 2025-09-07T09:35:00.6904448Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 14%] 2025-09-07T09:35:00.6904719Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0047s] [ 14%] 2025-09-07T09:35:00.6904987Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0039s] [ 15%] 2025-09-07T09:35:00.6905304Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 15%] 2025-09-07T09:35:00.6905588Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 15%] 2025-09-07T09:35:00.6905858Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0046s] [ 15%] 2025-09-07T09:35:00.6906127Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0038s] [ 15%] 2025-09-07T09:35:00.6907599Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0107s] [ 15%] 2025-09-07T09:35:00.6907882Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 15%] 2025-09-07T09:35:00.6908160Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0062s] [ 15%] 2025-09-07T09:35:00.6908432Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0051s] [ 15%] 2025-09-07T09:35:00.6908745Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 15%] 2025-09-07T09:35:00.6909041Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 15%] 2025-09-07T09:35:00.6909315Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0062s] [ 15%] 2025-09-07T09:35:00.6909591Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0050s] [ 15%] 2025-09-07T09:35:00.6909864Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0103s] [ 15%] 2025-09-07T09:35:00.6910136Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0031s] [ 15%] 2025-09-07T09:35:00.6910409Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0062s] [ 15%] 2025-09-07T09:35:00.6910697Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0050s] [ 15%] 2025-09-07T09:35:00.6910986Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0030s] [ 15%] 2025-09-07T09:35:00.6911257Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0030s] [ 15%] 2025-09-07T09:35:00.6911528Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0062s] [ 15%] 2025-09-07T09:35:00.6911800Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0050s] [ 15%] 2025-09-07T09:35:00.6912075Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 15%] 2025-09-07T09:35:00.6912348Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 15%] 2025-09-07T09:35:00.6912623Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0062s] [ 15%] 2025-09-07T09:35:00.6912911Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0050s] [ 15%] 2025-09-07T09:35:00.6913198Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 15%] 2025-09-07T09:35:00.6914465Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 15%] 2025-09-07T09:35:00.6914743Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0061s] [ 15%] 2025-09-07T09:35:00.6915019Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0050s] [ 15%] 2025-09-07T09:35:00.6915293Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0030s] [ 15%] 2025-09-07T09:35:00.6915562Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0030s] [ 15%] 2025-09-07T09:35:00.6915855Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0063s] [ 15%] 2025-09-07T09:35:00.6916140Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0050s] [ 15%] 2025-09-07T09:35:00.6916415Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0030s] [ 15%] 2025-09-07T09:35:00.6916758Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0030s] [ 15%] 2025-09-07T09:35:00.6917032Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0061s] [ 15%] 2025-09-07T09:35:00.6917304Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0049s] [ 15%] 2025-09-07T09:35:00.6917579Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0037s] [ 15%] 2025-09-07T09:35:00.6917856Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 15%] 2025-09-07T09:35:00.6918173Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 15%] 2025-09-07T09:35:00.6918462Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0035s] [ 15%] 2025-09-07T09:35:00.6918738Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 15%] 2025-09-07T09:35:00.6919017Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 15%] 2025-09-07T09:35:00.6919294Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 15%] 2025-09-07T09:35:00.6919568Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0035s] [ 15%] 2025-09-07T09:35:00.6919840Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0033s] [ 15%] 2025-09-07T09:35:00.6921180Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 15%] 2025-09-07T09:35:00.6921479Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 15%] 2025-09-07T09:35:00.6921752Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0035s] [ 15%] 2025-09-07T09:35:00.6922024Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 15%] 2025-09-07T09:35:00.6922297Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 15%] 2025-09-07T09:35:00.6922570Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 15%] 2025-09-07T09:35:00.6922841Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0034s] [ 15%] 2025-09-07T09:35:00.6923116Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 15%] 2025-09-07T09:35:00.6923408Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 15%] 2025-09-07T09:35:00.6923697Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 15%] 2025-09-07T09:35:00.6923970Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0036s] [ 15%] 2025-09-07T09:35:00.6924248Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 15%] 2025-09-07T09:35:00.6924527Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0045s] [ 15%] 2025-09-07T09:35:00.6924803Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0036s] [ 15%] 2025-09-07T09:35:00.6925077Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0039s] [ 15%] 2025-09-07T09:35:00.6925363Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0033s] [ 15%] 2025-09-07T09:35:00.6925651Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 15%] 2025-09-07T09:35:00.6925925Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0031s] [ 15%] 2025-09-07T09:35:00.6926197Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0037s] [ 15%] 2025-09-07T09:35:00.6926472Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 15%] 2025-09-07T09:35:00.6927792Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 15%] 2025-09-07T09:35:00.6928065Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0030s] [ 15%] 2025-09-07T09:35:00.6928337Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0037s] [ 15%] 2025-09-07T09:35:00.6928649Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 15%] 2025-09-07T09:35:00.6928944Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 15%] 2025-09-07T09:35:00.6929221Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 15%] 2025-09-07T09:35:00.6929494Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0036s] [ 15%] 2025-09-07T09:35:00.6929773Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 15%] 2025-09-07T09:35:00.6930049Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 15%] 2025-09-07T09:35:00.6930324Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 15%] 2025-09-07T09:35:00.6930621Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0037s] [ 15%] 2025-09-07T09:35:00.6930933Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 15%] 2025-09-07T09:35:00.6931204Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 15%] 2025-09-07T09:35:00.6931479Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0030s] [ 15%] 2025-09-07T09:35:00.6931752Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0036s] [ 15%] 2025-09-07T09:35:00.6932025Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 15%] 2025-09-07T09:35:00.6932298Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 15%] 2025-09-07T09:35:00.6932569Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0030s] [ 15%] 2025-09-07T09:35:00.6932856Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0036s] [ 15%] 2025-09-07T09:35:00.6933223Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6933578Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6934907Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6935258Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6935629Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6936000Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6936352Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6936769Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6937120Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6937470Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6937817Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6938194Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6938558Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6938906Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6939323Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6939669Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6940040Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6940409Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6940762Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6941111Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6941466Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6941819Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6943159Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6943531Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6943895Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6944243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6944594Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6944944Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6945308Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6945673Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6946022Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6946368Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6946788Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6947145Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6947497Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6947874Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6948243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6948596Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 15%] 2025-09-07T09:35:00.6948947Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6949301Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6949673Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6950037Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6951368Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6951716Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6952069Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6952425Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6952773Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6953146Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6953440Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0054s] [ 16%] 2025-09-07T09:35:00.6953715Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 16%] 2025-09-07T09:35:00.6953987Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 16%] 2025-09-07T09:35:00.6954263Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0031s] [ 16%] 2025-09-07T09:35:00.6954541Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 16%] 2025-09-07T09:35:00.6954821Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 16%] 2025-09-07T09:35:00.6955118Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 16%] 2025-09-07T09:35:00.6955402Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0031s] [ 16%] 2025-09-07T09:35:00.6955674Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0038s] [ 16%] 2025-09-07T09:35:00.6955946Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 16%] 2025-09-07T09:35:00.6956218Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 16%] 2025-09-07T09:35:00.6956549Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0031s] [ 16%] 2025-09-07T09:35:00.6956820Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 16%] 2025-09-07T09:35:00.6957094Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 16%] 2025-09-07T09:35:00.6958392Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 16%] 2025-09-07T09:35:00.6958686Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0031s] [ 16%] 2025-09-07T09:35:00.6958960Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 16%] 2025-09-07T09:35:00.6959235Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 16%] 2025-09-07T09:35:00.6959513Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 16%] 2025-09-07T09:35:00.6959789Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0037s] [ 16%] 2025-09-07T09:35:00.6960065Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 16%] 2025-09-07T09:35:00.6960384Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 16%] 2025-09-07T09:35:00.6960678Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 16%] 2025-09-07T09:35:00.6960953Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0039s] [ 16%] 2025-09-07T09:35:00.6961228Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0072s] [ 16%] 2025-09-07T09:35:00.6961505Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 16%] 2025-09-07T09:35:00.6961780Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 16%] 2025-09-07T09:35:00.6962052Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0032s] [ 16%] 2025-09-07T09:35:00.6962326Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 16%] 2025-09-07T09:35:00.6962617Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 16%] 2025-09-07T09:35:00.6962899Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 16%] 2025-09-07T09:35:00.6963172Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0032s] [ 16%] 2025-09-07T09:35:00.6963448Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 16%] 2025-09-07T09:35:00.6963724Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 16%] 2025-09-07T09:35:00.6964967Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 16%] 2025-09-07T09:35:00.6965240Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0037s] [ 16%] 2025-09-07T09:35:00.6965541Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 16%] 2025-09-07T09:35:00.6965833Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 16%] 2025-09-07T09:35:00.6966111Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 16%] 2025-09-07T09:35:00.6966388Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0037s] [ 16%] 2025-09-07T09:35:00.6966709Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 16%] 2025-09-07T09:35:00.6966982Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 16%] 2025-09-07T09:35:00.6967255Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 16%] 2025-09-07T09:35:00.6967530Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0033s] [ 16%] 2025-09-07T09:35:00.6967841Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 16%] 2025-09-07T09:35:00.6968132Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 16%] 2025-09-07T09:35:00.6968403Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 16%] 2025-09-07T09:35:00.6968678Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0033s] [ 16%] 2025-09-07T09:35:00.6969032Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6969383Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6969761Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6970128Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6970484Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6970837Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6971189Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6972522Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6972892Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6973262Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6973613Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6973963Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6974314Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6974660Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6975023Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6975382Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6975736Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6976091Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6976442Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6976840Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6977221Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6977588Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6977941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6978293Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6978643Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6979064Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6979438Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6980789Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6981141Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6981495Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6981845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6982216Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6982592Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6982961Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6983348Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6983700Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6984051Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6984402Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6984764Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6985125Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6985472Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6985823Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6986172Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6986594Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6986964Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6987330Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6987678Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6989013Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 16%] 2025-09-07T09:35:00.6989295Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 16%] 2025-09-07T09:35:00.6989572Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 16%] 2025-09-07T09:35:00.6989843Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 16%] 2025-09-07T09:35:00.6990142Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 16%] 2025-09-07T09:35:00.6990434Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 16%] 2025-09-07T09:35:00.6990706Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 16%] 2025-09-07T09:35:00.6990979Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 16%] 2025-09-07T09:35:00.6991251Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 16%] 2025-09-07T09:35:00.6991521Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 16%] 2025-09-07T09:35:00.6991789Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0022s] [ 16%] 2025-09-07T09:35:00.6992056Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 16%] 2025-09-07T09:35:00.6992365Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 16%] 2025-09-07T09:35:00.6992647Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 16%] 2025-09-07T09:35:00.6992919Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 16%] 2025-09-07T09:35:00.6993186Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 16%] 2025-09-07T09:35:00.6993455Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 16%] 2025-09-07T09:35:00.6993730Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0039s] [ 17%] 2025-09-07T09:35:00.6994002Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 17%] 2025-09-07T09:35:00.6994289Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 17%] 2025-09-07T09:35:00.6995532Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 17%] 2025-09-07T09:35:00.6995809Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 17%] 2025-09-07T09:35:00.6996082Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 17%] 2025-09-07T09:35:00.6996359Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 17%] 2025-09-07T09:35:00.6996715Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 17%] 2025-09-07T09:35:00.6996984Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0044s] [ 17%] 2025-09-07T09:35:00.6997255Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 17%] 2025-09-07T09:35:00.6997564Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 17%] 2025-09-07T09:35:00.6997849Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 17%] 2025-09-07T09:35:00.6998119Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 17%] 2025-09-07T09:35:00.6998391Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 17%] 2025-09-07T09:35:00.6998665Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 17%] 2025-09-07T09:35:00.6998935Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 17%] 2025-09-07T09:35:00.6999209Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 17%] 2025-09-07T09:35:00.6999506Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 17%] 2025-09-07T09:35:00.6999793Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 17%] 2025-09-07T09:35:00.7000061Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 17%] 2025-09-07T09:35:00.7000336Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 17%] 2025-09-07T09:35:00.7000609Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 17%] 2025-09-07T09:35:00.7000880Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 17%] 2025-09-07T09:35:00.7002126Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 17%] 2025-09-07T09:35:00.7002397Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 17%] 2025-09-07T09:35:00.7002695Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 17%] 2025-09-07T09:35:00.7002978Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 17%] 2025-09-07T09:35:00.7003243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 17%] 2025-09-07T09:35:00.7003517Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 17%] 2025-09-07T09:35:00.7003788Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 17%] 2025-09-07T09:35:00.7004057Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 17%] 2025-09-07T09:35:00.7004326Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 17%] 2025-09-07T09:35:00.7004693Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7005055Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7005402Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7005749Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7006099Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7006447Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7006887Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7007253Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7007598Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7007943Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7008288Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7008631Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7009985Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7010358Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7010701Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7011046Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7011394Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7011740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7012104Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7012472Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7012823Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7013176Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7013525Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7013872Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7014235Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7014593Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7014936Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7015279Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7015627Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7015972Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7016330Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7016741Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7018065Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7018419Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7018769Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7019172Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7019568Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7019938Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7020286Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7020638Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7020985Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7021329Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7021692Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7022049Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7022396Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0010s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7022745Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7023090Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7023433Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 17%] 2025-09-07T09:35:00.7023732Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0038s] [ 17%] 2025-09-07T09:35:00.7024026Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 17%] 2025-09-07T09:35:00.7024304Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0043s] [ 17%] 2025-09-07T09:35:00.7024579Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0051s] [ 17%] 2025-09-07T09:35:00.7025833Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 17%] 2025-09-07T09:35:00.7026114Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 17%] 2025-09-07T09:35:00.7026391Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0044s] [ 17%] 2025-09-07T09:35:00.7026716Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0050s] [ 17%] 2025-09-07T09:35:00.7027038Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 17%] 2025-09-07T09:35:00.7027332Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0029s] [ 17%] 2025-09-07T09:35:00.7027605Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0043s] [ 17%] 2025-09-07T09:35:00.7027879Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0050s] [ 17%] 2025-09-07T09:35:00.7028159Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 17%] 2025-09-07T09:35:00.7028435Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 17%] 2025-09-07T09:35:00.7028709Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0044s] [ 17%] 2025-09-07T09:35:00.7029021Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0050s] [ 17%] 2025-09-07T09:35:00.7029322Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0031s] [ 17%] 2025-09-07T09:35:00.7029599Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0031s] [ 17%] 2025-09-07T09:35:00.7029876Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0049s] [ 17%] 2025-09-07T09:35:00.7030156Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0054s] [ 17%] 2025-09-07T09:35:00.7030440Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0031s] [ 17%] 2025-09-07T09:35:00.7030718Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 17%] 2025-09-07T09:35:00.7030995Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0048s] [ 17%] 2025-09-07T09:35:00.7031288Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0054s] [ 17%] 2025-09-07T09:35:00.7032556Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0031s] [ 17%] 2025-09-07T09:35:00.7032834Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0031s] [ 17%] 2025-09-07T09:35:00.7033109Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0050s] [ 17%] 2025-09-07T09:35:00.7033388Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0056s] [ 17%] 2025-09-07T09:35:00.7033670Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0031s] [ 17%] 2025-09-07T09:35:00.7033946Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0031s] [ 17%] 2025-09-07T09:35:00.7034245Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0050s] [ 17%] 2025-09-07T09:35:00.7034535Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0057s] [ 17%] 2025-09-07T09:35:00.7034812Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0031s] [ 17%] 2025-09-07T09:35:00.7035091Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 17%] 2025-09-07T09:35:00.7035369Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0049s] [ 17%] 2025-09-07T09:35:00.7035648Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0056s] [ 17%] 2025-09-07T09:35:00.7035929Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0031s] [ 17%] 2025-09-07T09:35:00.7036211Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 17%] 2025-09-07T09:35:00.7036564Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0049s] [ 17%] 2025-09-07T09:35:00.7036856Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0055s] [ 17%] 2025-09-07T09:35:00.7037130Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0031s] [ 17%] 2025-09-07T09:35:00.7037407Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0031s] [ 17%] 2025-09-07T09:35:00.7037683Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0050s] [ 18%] 2025-09-07T09:35:00.7037957Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0056s] [ 18%] 2025-09-07T09:35:00.7039238Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0031s] [ 18%] 2025-09-07T09:35:00.7039533Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0031s] [ 18%] 2025-09-07T09:35:00.7039811Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0050s] [ 18%] 2025-09-07T09:35:00.7040087Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0057s] [ 18%] 2025-09-07T09:35:00.7040444Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7040802Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7041157Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7041511Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7041887Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7042258Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7042610Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7042966Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7043319Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7043683Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7044043Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7044390Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7044741Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7045093Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7045444Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7045794Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7046168Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7047596Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7047951Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7048306Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7048667Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7049051Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7049423Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7049777Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7050130Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7050485Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7050838Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7051189Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7051561Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7051925Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7052275Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7052627Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7052982Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7053356Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7053718Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7054069Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7054426Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7055762Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7056114Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7056549Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7056929Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7057281Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7057634Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7057984Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7058336Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7058720Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7059133Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7059483Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7059766Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0048s] [ 18%] 2025-09-07T09:35:00.7060049Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 18%] 2025-09-07T09:35:00.7060329Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0049s] [ 18%] 2025-09-07T09:35:00.7060607Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0059s] [ 18%] 2025-09-07T09:35:00.7060918Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 18%] 2025-09-07T09:35:00.7061211Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 18%] 2025-09-07T09:35:00.7061489Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0048s] [ 18%] 2025-09-07T09:35:00.7061767Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0059s] [ 18%] 2025-09-07T09:35:00.7062042Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 18%] 2025-09-07T09:35:00.7063302Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 18%] 2025-09-07T09:35:00.7063578Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0051s] [ 18%] 2025-09-07T09:35:00.7063869Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0060s] [ 18%] 2025-09-07T09:35:00.7064178Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 18%] 2025-09-07T09:35:00.7064459Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 18%] 2025-09-07T09:35:00.7064734Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0050s] [ 18%] 2025-09-07T09:35:00.7065010Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0060s] [ 18%] 2025-09-07T09:35:00.7065291Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 18%] 2025-09-07T09:35:00.7065568Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 18%] 2025-09-07T09:35:00.7065847Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0048s] [ 18%] 2025-09-07T09:35:00.7066148Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0047s] [ 18%] 2025-09-07T09:35:00.7066445Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 18%] 2025-09-07T09:35:00.7066790Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 18%] 2025-09-07T09:35:00.7067067Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0048s] [ 18%] 2025-09-07T09:35:00.7067349Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0045s] [ 18%] 2025-09-07T09:35:00.7067626Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 18%] 2025-09-07T09:35:00.7067901Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 18%] 2025-09-07T09:35:00.7068208Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0048s] [ 18%] 2025-09-07T09:35:00.7068499Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0047s] [ 18%] 2025-09-07T09:35:00.7068778Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0030s] [ 18%] 2025-09-07T09:35:00.7070041Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 18%] 2025-09-07T09:35:00.7070322Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0048s] [ 18%] 2025-09-07T09:35:00.7070601Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0047s] [ 18%] 2025-09-07T09:35:00.7070880Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 18%] 2025-09-07T09:35:00.7071159Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 18%] 2025-09-07T09:35:00.7071473Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0049s] [ 18%] 2025-09-07T09:35:00.7071772Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0045s] [ 18%] 2025-09-07T09:35:00.7072053Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 18%] 2025-09-07T09:35:00.7072337Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 18%] 2025-09-07T09:35:00.7072616Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0048s] [ 18%] 2025-09-07T09:35:00.7072895Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0045s] [ 18%] 2025-09-07T09:35:00.7073169Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 18%] 2025-09-07T09:35:00.7073469Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 18%] 2025-09-07T09:35:00.7073758Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0048s] [ 18%] 2025-09-07T09:35:00.7074030Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0045s] [ 18%] 2025-09-07T09:35:00.7074310Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 18%] 2025-09-07T09:35:00.7074588Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 18%] 2025-09-07T09:35:00.7074865Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0048s] [ 18%] 2025-09-07T09:35:00.7075140Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0045s] [ 18%] 2025-09-07T09:35:00.7075512Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7076983Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7077338Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7077700Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7078059Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7078416Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7078809Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7079178Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7079529Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7079883Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7080237Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7080586Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7080955Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7081321Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7081671Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7082024Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7082384Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7082736Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7083106Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7083475Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7083832Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 18%] 2025-09-07T09:35:00.7085171Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7085530Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7085884Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7086257Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7086685Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7087037Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7087387Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7087742Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7088093Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7088471Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7088835Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7089187Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7089543Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7089896Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7090248Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7090620Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7090993Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7091345Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7091698Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7092054Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7093385Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7093758Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7094123Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7094479Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7094830Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7095184Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7095532Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7095813Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0050s] [ 19%] 2025-09-07T09:35:00.7096104Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 19%] 2025-09-07T09:35:00.7096406Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 19%] 2025-09-07T09:35:00.7096746Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 19%] 2025-09-07T09:35:00.7097027Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 19%] 2025-09-07T09:35:00.7097308Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 19%] 2025-09-07T09:35:00.7097585Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 19%] 2025-09-07T09:35:00.7097891Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 19%] 2025-09-07T09:35:00.7098180Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0032s] [ 19%] 2025-09-07T09:35:00.7098454Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 19%] 2025-09-07T09:35:00.7098724Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 19%] 2025-09-07T09:35:00.7099041Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 19%] 2025-09-07T09:35:00.7099316Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 19%] 2025-09-07T09:35:00.7099587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 19%] 2025-09-07T09:35:00.7100846Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 19%] 2025-09-07T09:35:00.7101147Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 19%] 2025-09-07T09:35:00.7101446Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 19%] 2025-09-07T09:35:00.7101721Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 19%] 2025-09-07T09:35:00.7101996Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0031s] [ 19%] 2025-09-07T09:35:00.7102274Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 19%] 2025-09-07T09:35:00.7102552Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 19%] 2025-09-07T09:35:00.7102829Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 19%] 2025-09-07T09:35:00.7103130Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0031s] [ 19%] 2025-09-07T09:35:00.7103419Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 19%] 2025-09-07T09:35:00.7103693Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0033s] [ 19%] 2025-09-07T09:35:00.7103964Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 19%] 2025-09-07T09:35:00.7104237Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0032s] [ 19%] 2025-09-07T09:35:00.7104509Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 19%] 2025-09-07T09:35:00.7104784Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 19%] 2025-09-07T09:35:00.7105056Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 19%] 2025-09-07T09:35:00.7105349Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0032s] [ 19%] 2025-09-07T09:35:00.7105632Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 19%] 2025-09-07T09:35:00.7105907Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 19%] 2025-09-07T09:35:00.7106183Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 19%] 2025-09-07T09:35:00.7107503Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0032s] [ 19%] 2025-09-07T09:35:00.7107782Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0031s] [ 19%] 2025-09-07T09:35:00.7108059Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 19%] 2025-09-07T09:35:00.7108371Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 19%] 2025-09-07T09:35:00.7108668Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0032s] [ 19%] 2025-09-07T09:35:00.7108945Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0031s] [ 19%] 2025-09-07T09:35:00.7109218Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 19%] 2025-09-07T09:35:00.7109492Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 19%] 2025-09-07T09:35:00.7109767Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0031s] [ 19%] 2025-09-07T09:35:00.7110037Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 19%] 2025-09-07T09:35:00.7110311Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 19%] 2025-09-07T09:35:00.7110610Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 19%] 2025-09-07T09:35:00.7110899Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0032s] [ 19%] 2025-09-07T09:35:00.7111171Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 19%] 2025-09-07T09:35:00.7111526Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7111879Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7112227Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7112593Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7112961Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7113313Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7114632Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7114985Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7115334Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7115683Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7116049Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7116410Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7116824Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7117177Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7117525Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7117899Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7118268Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7118622Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7118971Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7119321Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7119673Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7120026Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7120403Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7120774Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7121125Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7121472Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7122794Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7123159Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7123527Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7123880Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7124233Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7124584Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7124936Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7125288Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7125656Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7126018Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7126376Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7126805Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7127157Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7127528Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7127895Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7128243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7128591Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7128939Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7129291Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7129641Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7130997Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 19%] 2025-09-07T09:35:00.7131365Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7131642Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0047s] [ 20%] 2025-09-07T09:35:00.7131922Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 20%] 2025-09-07T09:35:00.7132198Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 20%] 2025-09-07T09:35:00.7132471Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7132763Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 20%] 2025-09-07T09:35:00.7133051Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 20%] 2025-09-07T09:35:00.7133323Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 20%] 2025-09-07T09:35:00.7133594Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 20%] 2025-09-07T09:35:00.7133865Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0038s] [ 20%] 2025-09-07T09:35:00.7134139Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 20%] 2025-09-07T09:35:00.7134407Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7134674Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7134964Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 20%] 2025-09-07T09:35:00.7135245Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 20%] 2025-09-07T09:35:00.7135516Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7135791Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7136067Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0040s] [ 20%] 2025-09-07T09:35:00.7136347Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 20%] 2025-09-07T09:35:00.7136686Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 20%] 2025-09-07T09:35:00.7137974Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 20%] 2025-09-07T09:35:00.7138276Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7138557Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7138831Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0035s] [ 20%] 2025-09-07T09:35:00.7139173Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 20%] 2025-09-07T09:35:00.7139449Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0041s] [ 20%] 2025-09-07T09:35:00.7139723Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 20%] 2025-09-07T09:35:00.7139995Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 20%] 2025-09-07T09:35:00.7140296Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0037s] [ 20%] 2025-09-07T09:35:00.7140587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 20%] 2025-09-07T09:35:00.7140859Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7141129Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0030s] [ 20%] 2025-09-07T09:35:00.7141403Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 20%] 2025-09-07T09:35:00.7141677Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7141953Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7142241Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 20%] 2025-09-07T09:35:00.7142526Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 20%] 2025-09-07T09:35:00.7142804Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7143081Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7143356Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 20%] 2025-09-07T09:35:00.7144607Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 20%] 2025-09-07T09:35:00.7144880Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7145156Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7145450Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 20%] 2025-09-07T09:35:00.7145737Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 20%] 2025-09-07T09:35:00.7146010Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7146283Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7146621Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 20%] 2025-09-07T09:35:00.7146893Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 20%] 2025-09-07T09:35:00.7147245Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7147630Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7147996Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7148340Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7148693Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7149045Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7149393Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7149772Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7150132Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7150478Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7150823Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7152151Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7152499Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7152864Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7153225Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7153568Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7153919Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7154268Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7154615Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7154981Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7155347Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7155699Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7156050Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7156399Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7156820Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7157184Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7157555Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7157902Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7158249Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7158596Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7158941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7160291Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7160666Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7161014Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7161368Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7161715Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7162065Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7162434Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7162796Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7163146Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7163494Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7163841Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7164187Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7164532Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7164896Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7165259Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7165604Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7165950Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 20%] 2025-09-07T09:35:00.7166226Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0045s] [ 20%] 2025-09-07T09:35:00.7166604Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 20%] 2025-09-07T09:35:00.7166898Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7168154Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 20%] 2025-09-07T09:35:00.7168431Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 20%] 2025-09-07T09:35:00.7168709Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 20%] 2025-09-07T09:35:00.7168985Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7169259Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7169530Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 20%] 2025-09-07T09:35:00.7169802Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 20%] 2025-09-07T09:35:00.7170095Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 20%] 2025-09-07T09:35:00.7170383Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0037s] [ 20%] 2025-09-07T09:35:00.7170656Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 20%] 2025-09-07T09:35:00.7170934Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 20%] 2025-09-07T09:35:00.7171206Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 20%] 2025-09-07T09:35:00.7171478Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7171767Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0040s] [ 20%] 2025-09-07T09:35:00.7172052Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7172324Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 20%] 2025-09-07T09:35:00.7172598Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 20%] 2025-09-07T09:35:00.7172876Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7173155Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7173429Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 20%] 2025-09-07T09:35:00.7173702Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 20%] 2025-09-07T09:35:00.7174974Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0038s] [ 20%] 2025-09-07T09:35:00.7175263Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 20%] 2025-09-07T09:35:00.7175533Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 21%] 2025-09-07T09:35:00.7175803Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 21%] 2025-09-07T09:35:00.7176079Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 21%] 2025-09-07T09:35:00.7176353Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 21%] 2025-09-07T09:35:00.7176678Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 21%] 2025-09-07T09:35:00.7176983Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 21%] 2025-09-07T09:35:00.7177274Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 21%] 2025-09-07T09:35:00.7177550Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 21%] 2025-09-07T09:35:00.7177824Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 21%] 2025-09-07T09:35:00.7178097Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 21%] 2025-09-07T09:35:00.7178376Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 21%] 2025-09-07T09:35:00.7178654Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 21%] 2025-09-07T09:35:00.7178928Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 21%] 2025-09-07T09:35:00.7179285Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 21%] 2025-09-07T09:35:00.7179582Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 21%] 2025-09-07T09:35:00.7179851Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 21%] 2025-09-07T09:35:00.7180121Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 21%] 2025-09-07T09:35:00.7180391Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 21%] 2025-09-07T09:35:00.7181641Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 21%] 2025-09-07T09:35:00.7181915Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 21%] 2025-09-07T09:35:00.7182209Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 21%] 2025-09-07T09:35:00.7182496Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 21%] 2025-09-07T09:35:00.7182848Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7183196Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7183549Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7183896Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7184247Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7184611Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7184978Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7185324Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7185673Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7186020Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7186365Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7186866Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7187230Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7187574Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7187920Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7188265Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7189596Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7189974Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7190340Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7190687Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7191042Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7191392Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7191743Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7192107Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7192465Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7192812Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7193156Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7193501Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7193848Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7194209Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7194573Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7194918Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7195268Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7195616Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7195964Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7196325Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7197731Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7198082Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7198438Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7198792Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7199138Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7199531Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7199901Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7200244Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7200592Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7200940Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7201282Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7201650Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7201941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0038s] [ 21%] 2025-09-07T09:35:00.7202213Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 21%] 2025-09-07T09:35:00.7202483Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 21%] 2025-09-07T09:35:00.7202755Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 21%] 2025-09-07T09:35:00.7203028Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 21%] 2025-09-07T09:35:00.7203299Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 21%] 2025-09-07T09:35:00.7203571Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 21%] 2025-09-07T09:35:00.7203866Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 21%] 2025-09-07T09:35:00.7205111Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0042s] [ 21%] 2025-09-07T09:35:00.7205380Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 21%] 2025-09-07T09:35:00.7205647Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 21%] 2025-09-07T09:35:00.7205916Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 21%] 2025-09-07T09:35:00.7206187Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 21%] 2025-09-07T09:35:00.7206455Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 21%] 2025-09-07T09:35:00.7206827Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 21%] 2025-09-07T09:35:00.7207113Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 21%] 2025-09-07T09:35:00.7207385Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0044s] [ 21%] 2025-09-07T09:35:00.7207660Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 21%] 2025-09-07T09:35:00.7207932Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 21%] 2025-09-07T09:35:00.7208200Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 21%] 2025-09-07T09:35:00.7208473Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 21%] 2025-09-07T09:35:00.7208745Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 21%] 2025-09-07T09:35:00.7209057Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 21%] 2025-09-07T09:35:00.7209341Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 21%] 2025-09-07T09:35:00.7209611Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0037s] [ 21%] 2025-09-07T09:35:00.7209880Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 21%] 2025-09-07T09:35:00.7210148Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 21%] 2025-09-07T09:35:00.7210414Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 21%] 2025-09-07T09:35:00.7210683Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 21%] 2025-09-07T09:35:00.7211942Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 21%] 2025-09-07T09:35:00.7212227Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 21%] 2025-09-07T09:35:00.7212495Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 21%] 2025-09-07T09:35:00.7212766Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 21%] 2025-09-07T09:35:00.7213043Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 21%] 2025-09-07T09:35:00.7213317Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 21%] 2025-09-07T09:35:00.7213584Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 21%] 2025-09-07T09:35:00.7213862Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 21%] 2025-09-07T09:35:00.7214163Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 21%] 2025-09-07T09:35:00.7214447Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 21%] 2025-09-07T09:35:00.7214717Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 21%] 2025-09-07T09:35:00.7214985Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 21%] 2025-09-07T09:35:00.7215257Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 21%] 2025-09-07T09:35:00.7215523Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 21%] 2025-09-07T09:35:00.7215789Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 21%] 2025-09-07T09:35:00.7216072Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 21%] 2025-09-07T09:35:00.7216354Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 21%] 2025-09-07T09:35:00.7216679Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 21%] 2025-09-07T09:35:00.7216947Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 21%] 2025-09-07T09:35:00.7217298Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7218622Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7219007Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7219390Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 21%] 2025-09-07T09:35:00.7219757Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7220104Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7220451Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7220798Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7221141Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7221503Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7221863Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7222201Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7222547Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7222897Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7223237Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7223592Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7223950Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7224298Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7224644Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7224991Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7225339Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7226736Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7227103Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7227447Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7227791Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7228134Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7228479Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7228842Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7229205Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7229548Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7229892Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7230234Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7230582Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7230944Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7231306Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7231651Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7231999Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7232349Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7232697Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7233045Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7233402Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7234739Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7235085Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7235429Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7235775Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7236150Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7236576Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7236920Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7237198Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0045s] [ 22%] 2025-09-07T09:35:00.7237478Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 22%] 2025-09-07T09:35:00.7237750Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0031s] [ 22%] 2025-09-07T09:35:00.7238022Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 22%] 2025-09-07T09:35:00.7238302Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 22%] 2025-09-07T09:35:00.7238615Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 22%] 2025-09-07T09:35:00.7238904Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 22%] 2025-09-07T09:35:00.7239177Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 22%] 2025-09-07T09:35:00.7239450Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 22%] 2025-09-07T09:35:00.7239723Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 22%] 2025-09-07T09:35:00.7239991Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0030s] [ 22%] 2025-09-07T09:35:00.7240280Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.7240569Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 22%] 2025-09-07T09:35:00.7241821Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 22%] 2025-09-07T09:35:00.7242093Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0030s] [ 22%] 2025-09-07T09:35:00.7242367Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.7242643Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.7242920Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.7243195Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0033s] [ 22%] 2025-09-07T09:35:00.7243470Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 22%] 2025-09-07T09:35:00.7243767Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.7244058Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.7244335Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0032s] [ 22%] 2025-09-07T09:35:00.7244610Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 22%] 2025-09-07T09:35:00.7244885Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.7245156Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 22%] 2025-09-07T09:35:00.7245443Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0032s] [ 22%] 2025-09-07T09:35:00.7245729Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 22%] 2025-09-07T09:35:00.7246003Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.7246277Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.7246619Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0032s] [ 22%] 2025-09-07T09:35:00.7246893Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 22%] 2025-09-07T09:35:00.7247167Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.7247441Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 22%] 2025-09-07T09:35:00.7248732Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0032s] [ 22%] 2025-09-07T09:35:00.7249025Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0031s] [ 22%] 2025-09-07T09:35:00.7249307Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.7249584Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.7249865Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0032s] [ 22%] 2025-09-07T09:35:00.7250140Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0031s] [ 22%] 2025-09-07T09:35:00.7250411Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.7250714Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.7251003Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0032s] [ 22%] 2025-09-07T09:35:00.7251276Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 22%] 2025-09-07T09:35:00.7251551Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.7251825Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 22%] 2025-09-07T09:35:00.7252098Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0032s] [ 22%] 2025-09-07T09:35:00.7252371Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 22%] 2025-09-07T09:35:00.7252724Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7253092Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7253453Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7253804Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7254156Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7254508Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7255847Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7256213Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7256643Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7256991Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7257343Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7257689Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7258041Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7258410Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7258774Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7259168Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7259526Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7259878Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7260227Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7260594Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7260962Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7261312Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7261664Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7262014Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7262362Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7262722Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7264064Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7264411Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7264766Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7265120Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7265467Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 22%] 2025-09-07T09:35:00.7265833Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7266200Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7266613Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7266961Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7267313Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7267664Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7268040Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7268411Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7268759Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7269107Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7269456Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7269803Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7270167Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7270532Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7270881Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7272208Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7272558Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7272840Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0049s] [ 23%] 2025-09-07T09:35:00.7273145Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 23%] 2025-09-07T09:35:00.7273435Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 23%] 2025-09-07T09:35:00.7273709Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 23%] 2025-09-07T09:35:00.7273987Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 23%] 2025-09-07T09:35:00.7274263Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 23%] 2025-09-07T09:35:00.7274540Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 23%] 2025-09-07T09:35:00.7274813Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 23%] 2025-09-07T09:35:00.7275105Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 23%] 2025-09-07T09:35:00.7275390Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 23%] 2025-09-07T09:35:00.7275660Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 23%] 2025-09-07T09:35:00.7275929Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 23%] 2025-09-07T09:35:00.7276200Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 23%] 2025-09-07T09:35:00.7276474Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 23%] 2025-09-07T09:35:00.7276823Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 23%] 2025-09-07T09:35:00.7277095Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 23%] 2025-09-07T09:35:00.7277399Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 23%] 2025-09-07T09:35:00.7277690Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 23%] 2025-09-07T09:35:00.7278940Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 23%] 2025-09-07T09:35:00.7279215Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 23%] 2025-09-07T09:35:00.7279495Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 23%] 2025-09-07T09:35:00.7279774Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 23%] 2025-09-07T09:35:00.7280049Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0031s] [ 23%] 2025-09-07T09:35:00.7280355Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 23%] 2025-09-07T09:35:00.7280649Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0031s] [ 23%] 2025-09-07T09:35:00.7280922Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 23%] 2025-09-07T09:35:00.7281192Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0031s] [ 23%] 2025-09-07T09:35:00.7281465Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 23%] 2025-09-07T09:35:00.7281739Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 23%] 2025-09-07T09:35:00.7282013Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 23%] 2025-09-07T09:35:00.7282285Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0030s] [ 23%] 2025-09-07T09:35:00.7282576Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 23%] 2025-09-07T09:35:00.7282866Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 23%] 2025-09-07T09:35:00.7283141Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 23%] 2025-09-07T09:35:00.7283416Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 23%] 2025-09-07T09:35:00.7283688Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 23%] 2025-09-07T09:35:00.7283965Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 23%] 2025-09-07T09:35:00.7284239Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 23%] 2025-09-07T09:35:00.7284527Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 23%] 2025-09-07T09:35:00.7285779Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 23%] 2025-09-07T09:35:00.7286055Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 23%] 2025-09-07T09:35:00.7286327Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 23%] 2025-09-07T09:35:00.7286660Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0030s] [ 23%] 2025-09-07T09:35:00.7286934Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 23%] 2025-09-07T09:35:00.7287210Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 23%] 2025-09-07T09:35:00.7287480Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 23%] 2025-09-07T09:35:00.7287790Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0030s] [ 23%] 2025-09-07T09:35:00.7288086Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 23%] 2025-09-07T09:35:00.7288439Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7288791Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7289141Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7289489Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7289863Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7290230Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7290581Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7290935Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7291286Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7291632Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7292001Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7293328Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7293682Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7294032Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7294380Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7294726Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7295093Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7295460Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7295809Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7296160Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7296594Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7296944Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7297322Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7297687Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7298033Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7298385Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7298732Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7299119Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7299486Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7299850Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7300197Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7301530Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7301884Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7302238Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7302590Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7302963Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7303327Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7303681Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7304033Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7304380Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7304741Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7305106Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7305451Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7305796Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7306148Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7306558Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7306904Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7307272Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 23%] 2025-09-07T09:35:00.7307562Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0048s] [ 23%] 2025-09-07T09:35:00.7307834Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 23%] 2025-09-07T09:35:00.7308107Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 23%] 2025-09-07T09:35:00.7309356Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 23%] 2025-09-07T09:35:00.7309631Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 23%] 2025-09-07T09:35:00.7309925Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 23%] 2025-09-07T09:35:00.7310216Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 23%] 2025-09-07T09:35:00.7310489Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 23%] 2025-09-07T09:35:00.7310755Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0031s] [ 23%] 2025-09-07T09:35:00.7311026Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 24%] 2025-09-07T09:35:00.7311294Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 24%] 2025-09-07T09:35:00.7311562Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0024s] [ 24%] 2025-09-07T09:35:00.7311832Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 24%] 2025-09-07T09:35:00.7312120Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 24%] 2025-09-07T09:35:00.7312404Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 24%] 2025-09-07T09:35:00.7312670Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 24%] 2025-09-07T09:35:00.7312943Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0033s] [ 24%] 2025-09-07T09:35:00.7313218Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 24%] 2025-09-07T09:35:00.7313491Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 24%] 2025-09-07T09:35:00.7313764Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 24%] 2025-09-07T09:35:00.7314051Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 24%] 2025-09-07T09:35:00.7314344Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 24%] 2025-09-07T09:35:00.7314616Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 24%] 2025-09-07T09:35:00.7315857Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 24%] 2025-09-07T09:35:00.7316127Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 24%] 2025-09-07T09:35:00.7316397Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 24%] 2025-09-07T09:35:00.7316729Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 24%] 2025-09-07T09:35:00.7316998Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 24%] 2025-09-07T09:35:00.7317300Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 24%] 2025-09-07T09:35:00.7317587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 24%] 2025-09-07T09:35:00.7317856Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 24%] 2025-09-07T09:35:00.7318127Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 24%] 2025-09-07T09:35:00.7318401Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 24%] 2025-09-07T09:35:00.7318673Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 24%] 2025-09-07T09:35:00.7318946Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 24%] 2025-09-07T09:35:00.7319238Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 24%] 2025-09-07T09:35:00.7319528Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 24%] 2025-09-07T09:35:00.7319806Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 24%] 2025-09-07T09:35:00.7320078Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 24%] 2025-09-07T09:35:00.7320351Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 24%] 2025-09-07T09:35:00.7320622Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 24%] 2025-09-07T09:35:00.7320892Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 24%] 2025-09-07T09:35:00.7321160Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 24%] 2025-09-07T09:35:00.7321447Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 24%] 2025-09-07T09:35:00.7322703Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 24%] 2025-09-07T09:35:00.7322973Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 24%] 2025-09-07T09:35:00.7323243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 24%] 2025-09-07T09:35:00.7323515Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 24%] 2025-09-07T09:35:00.7323867Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7324215Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7324589Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7324953Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7325300Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7325649Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7325996Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7326340Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7326758Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7327121Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7327464Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7327807Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7328153Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7328497Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7328863Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7329227Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7330549Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7330902Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7331252Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7331597Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7331975Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7332342Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7332690Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7333038Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7333383Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7333727Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7334086Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7334439Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7334789Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7335136Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7335480Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7335823Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7336182Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7336603Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7336954Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7337302Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7338620Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7339016Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7339405Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7339776Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7340126Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7340474Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7340817Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7341161Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7341506Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7341879Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7342250Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7342597Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 24%] 2025-09-07T09:35:00.7342874Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0051s] [ 24%] 2025-09-07T09:35:00.7343145Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 24%] 2025-09-07T09:35:00.7343414Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 24%] 2025-09-07T09:35:00.7343695Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 24%] 2025-09-07T09:35:00.7343982Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 24%] 2025-09-07T09:35:00.7344253Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 24%] 2025-09-07T09:35:00.7344524Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 24%] 2025-09-07T09:35:00.7344798Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 24%] 2025-09-07T09:35:00.7346031Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0032s] [ 24%] 2025-09-07T09:35:00.7346300Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 24%] 2025-09-07T09:35:00.7346634Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 24%] 2025-09-07T09:35:00.7346931Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 24%] 2025-09-07T09:35:00.7347218Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 24%] 2025-09-07T09:35:00.7347486Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 24%] 2025-09-07T09:35:00.7347754Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 24%] 2025-09-07T09:35:00.7348025Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 24%] 2025-09-07T09:35:00.7348299Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0041s] [ 24%] 2025-09-07T09:35:00.7348571Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 24%] 2025-09-07T09:35:00.7348869Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 24%] 2025-09-07T09:35:00.7349154Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 24%] 2025-09-07T09:35:00.7349426Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 24%] 2025-09-07T09:35:00.7349697Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 24%] 2025-09-07T09:35:00.7349968Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 24%] 2025-09-07T09:35:00.7350241Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 24%] 2025-09-07T09:35:00.7350508Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0034s] [ 24%] 2025-09-07T09:35:00.7350775Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 24%] 2025-09-07T09:35:00.7351054Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 24%] 2025-09-07T09:35:00.7351333Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 24%] 2025-09-07T09:35:00.7352583Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 24%] 2025-09-07T09:35:00.7352853Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 24%] 2025-09-07T09:35:00.7353124Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 24%] 2025-09-07T09:35:00.7353394Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 24%] 2025-09-07T09:35:00.7353666Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 24%] 2025-09-07T09:35:00.7353954Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 24%] 2025-09-07T09:35:00.7354242Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 24%] 2025-09-07T09:35:00.7354511Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 25%] 2025-09-07T09:35:00.7354781Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 25%] 2025-09-07T09:35:00.7355054Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 25%] 2025-09-07T09:35:00.7355326Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 25%] 2025-09-07T09:35:00.7355598Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 25%] 2025-09-07T09:35:00.7355866Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 25%] 2025-09-07T09:35:00.7356150Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 25%] 2025-09-07T09:35:00.7356426Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 25%] 2025-09-07T09:35:00.7356748Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 25%] 2025-09-07T09:35:00.7357018Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 25%] 2025-09-07T09:35:00.7357288Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 25%] 2025-09-07T09:35:00.7357555Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 25%] 2025-09-07T09:35:00.7357823Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7358120Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0045s] [ 25%] 2025-09-07T09:35:00.7359384Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 25%] 2025-09-07T09:35:00.7359655Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 25%] 2025-09-07T09:35:00.7359922Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7360196Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 25%] 2025-09-07T09:35:00.7360468Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 25%] 2025-09-07T09:35:00.7360734Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 25%] 2025-09-07T09:35:00.7361004Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7361298Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0042s] [ 25%] 2025-09-07T09:35:00.7361580Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 25%] 2025-09-07T09:35:00.7361846Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 25%] 2025-09-07T09:35:00.7362110Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7362379Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 25%] 2025-09-07T09:35:00.7362646Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 25%] 2025-09-07T09:35:00.7362910Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 25%] 2025-09-07T09:35:00.7363192Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7363487Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0040s] [ 25%] 2025-09-07T09:35:00.7363756Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 25%] 2025-09-07T09:35:00.7364026Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 25%] 2025-09-07T09:35:00.7364295Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 25%] 2025-09-07T09:35:00.7364566Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7365800Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7366070Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 25%] 2025-09-07T09:35:00.7366361Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 25%] 2025-09-07T09:35:00.7366700Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0039s] [ 25%] 2025-09-07T09:35:00.7366967Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7367234Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 25%] 2025-09-07T09:35:00.7367502Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 25%] 2025-09-07T09:35:00.7367770Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7368036Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7368338Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 25%] 2025-09-07T09:35:00.7368625Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 25%] 2025-09-07T09:35:00.7368895Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7369164Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7369433Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 25%] 2025-09-07T09:35:00.7369701Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 25%] 2025-09-07T09:35:00.7369973Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7370243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7370537Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 25%] 2025-09-07T09:35:00.7370824Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 25%] 2025-09-07T09:35:00.7371092Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7371358Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 25%] 2025-09-07T09:35:00.7372598Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0030s] [ 25%] 2025-09-07T09:35:00.7372867Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 25%] 2025-09-07T09:35:00.7373133Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7373422Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7373701Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 25%] 2025-09-07T09:35:00.7373972Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 25%] 2025-09-07T09:35:00.7374242Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0032s] [ 25%] 2025-09-07T09:35:00.7374512Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 25%] 2025-09-07T09:35:00.7374782Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 25%] 2025-09-07T09:35:00.7375050Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 25%] 2025-09-07T09:35:00.7375320Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 25%] 2025-09-07T09:35:00.7375606Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 25%] 2025-09-07T09:35:00.7375888Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 25%] 2025-09-07T09:35:00.7376159Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 25%] 2025-09-07T09:35:00.7376429Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0031s] [ 25%] 2025-09-07T09:35:00.7376757Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 25%] 2025-09-07T09:35:00.7377025Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 25%] 2025-09-07T09:35:00.7377290Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 25%] 2025-09-07T09:35:00.7377588Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 25%] 2025-09-07T09:35:00.7377877Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 25%] 2025-09-07T09:35:00.7379209Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 25%] 2025-09-07T09:35:00.7379478Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 25%] 2025-09-07T09:35:00.7379755Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0038s] [ 25%] 2025-09-07T09:35:00.7380031Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0035s] [ 25%] 2025-09-07T09:35:00.7380303Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0034s] [ 25%] 2025-09-07T09:35:00.7380573Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7380873Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 25%] 2025-09-07T09:35:00.7381169Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 25%] 2025-09-07T09:35:00.7381440Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7381711Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7381982Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0043s] [ 25%] 2025-09-07T09:35:00.7382251Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 25%] 2025-09-07T09:35:00.7382520Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7382805Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7383087Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 25%] 2025-09-07T09:35:00.7383358Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 25%] 2025-09-07T09:35:00.7383625Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7383895Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7384172Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 25%] 2025-09-07T09:35:00.7384443Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 25%] 2025-09-07T09:35:00.7384714Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7385982Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7386275Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 25%] 2025-09-07T09:35:00.7386615Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 25%] 2025-09-07T09:35:00.7386885Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7387158Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 25%] 2025-09-07T09:35:00.7387432Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 25%] 2025-09-07T09:35:00.7387698Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 25%] 2025-09-07T09:35:00.7387993Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7388278Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7388549Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 25%] 2025-09-07T09:35:00.7388818Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 25%] 2025-09-07T09:35:00.7389084Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7389353Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7389623Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0091s] [ 25%] 2025-09-07T09:35:00.7389891Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 25%] 2025-09-07T09:35:00.7390158Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 25%] 2025-09-07T09:35:00.7390442Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7390729Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 25%] 2025-09-07T09:35:00.7390998Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 25%] 2025-09-07T09:35:00.7391266Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 25%] 2025-09-07T09:35:00.7392512Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 25%] 2025-09-07T09:35:00.7392780Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0038s] [ 25%] 2025-09-07T09:35:00.7393063Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 25%] 2025-09-07T09:35:00.7393343Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 25%] 2025-09-07T09:35:00.7393610Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 25%] 2025-09-07T09:35:00.7393877Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 25%] 2025-09-07T09:35:00.7394144Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 25%] 2025-09-07T09:35:00.7394409Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 26%] 2025-09-07T09:35:00.7394675Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7394945Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0035s] [ 26%] 2025-09-07T09:35:00.7395214Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7395498Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 26%] 2025-09-07T09:35:00.7395781Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 26%] 2025-09-07T09:35:00.7396052Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7396323Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7396672Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 26%] 2025-09-07T09:35:00.7396942Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 26%] 2025-09-07T09:35:00.7397211Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0035s] [ 26%] 2025-09-07T09:35:00.7397506Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 26%] 2025-09-07T09:35:00.7397792Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 26%] 2025-09-07T09:35:00.7399030Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 26%] 2025-09-07T09:35:00.7399298Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7399568Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7399835Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 26%] 2025-09-07T09:35:00.7400100Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 26%] 2025-09-07T09:35:00.7400371Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7400673Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7400958Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 26%] 2025-09-07T09:35:00.7401224Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 26%] 2025-09-07T09:35:00.7401496Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7401768Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7402038Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 26%] 2025-09-07T09:35:00.7402306Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 26%] 2025-09-07T09:35:00.7402590Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7402870Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7403136Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 26%] 2025-09-07T09:35:00.7403402Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 26%] 2025-09-07T09:35:00.7403668Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7403934Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7404199Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 26%] 2025-09-07T09:35:00.7404465Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 26%] 2025-09-07T09:35:00.7405714Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0033s] [ 26%] 2025-09-07T09:35:00.7405998Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 26%] 2025-09-07T09:35:00.7406264Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7406604Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7406876Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 26%] 2025-09-07T09:35:00.7407144Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 26%] 2025-09-07T09:35:00.7407410Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7407710Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7407991Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 26%] 2025-09-07T09:35:00.7408257Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 26%] 2025-09-07T09:35:00.7408520Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7408783Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7409048Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 26%] 2025-09-07T09:35:00.7409312Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 26%] 2025-09-07T09:35:00.7409575Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7409855Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7410140Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0033s] [ 26%] 2025-09-07T09:35:00.7410409Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7410674Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 26%] 2025-09-07T09:35:00.7410941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 26%] 2025-09-07T09:35:00.7412194Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7412465Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 26%] 2025-09-07T09:35:00.7412751Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7413033Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7413303Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 26%] 2025-09-07T09:35:00.7413570Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7413835Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 26%] 2025-09-07T09:35:00.7414099Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7414365Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7414631Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7414919Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7415204Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7415472Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7415740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 26%] 2025-09-07T09:35:00.7416008Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0031s] [ 26%] 2025-09-07T09:35:00.7416272Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7416604Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7416896Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 26%] 2025-09-07T09:35:00.7417178Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7417447Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7417710Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7419025Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7419292Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 26%] 2025-09-07T09:35:00.7419556Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 26%] 2025-09-07T09:35:00.7419822Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7420117Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7420401Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 26%] 2025-09-07T09:35:00.7420664Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 26%] 2025-09-07T09:35:00.7420933Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0046s] [ 26%] 2025-09-07T09:35:00.7421203Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 26%] 2025-09-07T09:35:00.7421469Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7421733Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7422014Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 26%] 2025-09-07T09:35:00.7422294Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 26%] 2025-09-07T09:35:00.7422559Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7422825Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7423087Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0037s] [ 26%] 2025-09-07T09:35:00.7423353Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7423618Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7423879Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7424143Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7425392Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 26%] 2025-09-07T09:35:00.7425672Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7425933Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7426203Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0038s] [ 26%] 2025-09-07T09:35:00.7426473Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 26%] 2025-09-07T09:35:00.7426811Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 26%] 2025-09-07T09:35:00.7427076Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 26%] 2025-09-07T09:35:00.7427375Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7427660Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0031s] [ 26%] 2025-09-07T09:35:00.7427924Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0032s] [ 26%] 2025-09-07T09:35:00.7428191Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 26%] 2025-09-07T09:35:00.7428455Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0038s] [ 26%] 2025-09-07T09:35:00.7428721Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7428988Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7429248Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7429543Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7429823Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 26%] 2025-09-07T09:35:00.7430085Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7430348Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7430616Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 26%] 2025-09-07T09:35:00.7430885Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 26%] 2025-09-07T09:35:00.7432122Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7432412Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7432697Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 26%] 2025-09-07T09:35:00.7432965Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 26%] 2025-09-07T09:35:00.7433231Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7433498Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 26%] 2025-09-07T09:35:00.7433763Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 27%] 2025-09-07T09:35:00.7434028Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 27%] 2025-09-07T09:35:00.7434296Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 27%] 2025-09-07T09:35:00.7434583Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 27%] 2025-09-07T09:35:00.7434866Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 27%] 2025-09-07T09:35:00.7435130Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 27%] 2025-09-07T09:35:00.7435394Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 27%] 2025-09-07T09:35:00.7435659Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 27%] 2025-09-07T09:35:00.7435935Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 27%] 2025-09-07T09:35:00.7436209Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 27%] 2025-09-07T09:35:00.7436578Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 27%] 2025-09-07T09:35:00.7436869Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 27%] 2025-09-07T09:35:00.7437144Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 27%] 2025-09-07T09:35:00.7437419Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 27%] 2025-09-07T09:35:00.7438667Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 27%] 2025-09-07T09:35:00.7438942Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 27%] 2025-09-07T09:35:00.7439212Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 27%] 2025-09-07T09:35:00.7439484Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 27%] 2025-09-07T09:35:00.7439791Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 27%] 2025-09-07T09:35:00.7440080Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 27%] 2025-09-07T09:35:00.7440351Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 27%] 2025-09-07T09:35:00.7440623Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 27%] 2025-09-07T09:35:00.7440895Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 27%] 2025-09-07T09:35:00.7441166Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 27%] 2025-09-07T09:35:00.7441440Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 27%] 2025-09-07T09:35:00.7441730Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 27%] 2025-09-07T09:35:00.7442017Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 27%] 2025-09-07T09:35:00.7442291Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 27%] 2025-09-07T09:35:00.7442567Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 27%] 2025-09-07T09:35:00.7442846Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 27%] 2025-09-07T09:35:00.7443121Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 27%] 2025-09-07T09:35:00.7443394Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0035s] [ 27%] 2025-09-07T09:35:00.7443666Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 27%] 2025-09-07T09:35:00.7443952Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 27%] 2025-09-07T09:35:00.7445205Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 27%] 2025-09-07T09:35:00.7445476Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 27%] 2025-09-07T09:35:00.7445750Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 27%] 2025-09-07T09:35:00.7446026Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 27%] 2025-09-07T09:35:00.7446302Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 27%] 2025-09-07T09:35:00.7446643Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 27%] 2025-09-07T09:35:00.7446950Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 27%] 2025-09-07T09:35:00.7447246Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 27%] 2025-09-07T09:35:00.7447521Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 27%] 2025-09-07T09:35:00.7447793Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 27%] 2025-09-07T09:35:00.7448070Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 27%] 2025-09-07T09:35:00.7448348Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 27%] 2025-09-07T09:35:00.7448622Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 27%] 2025-09-07T09:35:00.7448897Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 27%] 2025-09-07T09:35:00.7449194Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 27%] 2025-09-07T09:35:00.7449482Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 27%] 2025-09-07T09:35:00.7449755Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 27%] 2025-09-07T09:35:00.7450026Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 27%] 2025-09-07T09:35:00.7450300Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 27%] 2025-09-07T09:35:00.7450572Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 27%] 2025-09-07T09:35:00.7450842Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 27%] 2025-09-07T09:35:00.7452109Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 27%] 2025-09-07T09:35:00.7452490Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7452845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7453195Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7453544Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7453893Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7454262Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7454625Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7454975Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7455325Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7455672Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7456015Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7456372Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7456792Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7457138Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7457486Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7457833Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7458184Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7458532Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7458925Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7460355Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7460710Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7461064Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7461416Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7461801Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7462164Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7462510Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7462859Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7463206Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7463554Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7463902Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7464267Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0008s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7464623Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7464971Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7465325Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7465672Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7466030Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7466393Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7466802Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7467152Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7468478Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7468827Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7469174Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7469560Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7469922Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7470270Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7470618Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7470965Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7471336Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 27%] 2025-09-07T09:35:00.7471631Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 27%] 2025-09-07T09:35:00.7471905Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 27%] 2025-09-07T09:35:00.7472179Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 27%] 2025-09-07T09:35:00.7472452Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 27%] 2025-09-07T09:35:00.7472728Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 27%] 2025-09-07T09:35:00.7473004Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 27%] 2025-09-07T09:35:00.7473277Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 27%] 2025-09-07T09:35:00.7473564Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 27%] 2025-09-07T09:35:00.7473847Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 27%] 2025-09-07T09:35:00.7474115Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 27%] 2025-09-07T09:35:00.7474388Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0024s] [ 27%] 2025-09-07T09:35:00.7475626Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0024s] [ 27%] 2025-09-07T09:35:00.7475903Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 27%] 2025-09-07T09:35:00.7476174Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 27%] 2025-09-07T09:35:00.7476463Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0024s] [ 27%] 2025-09-07T09:35:00.7476815Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 27%] 2025-09-07T09:35:00.7477094Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 27%] 2025-09-07T09:35:00.7477367Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 27%] 2025-09-07T09:35:00.7477640Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 27%] 2025-09-07T09:35:00.7477915Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 28%] 2025-09-07T09:35:00.7478191Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 28%] 2025-09-07T09:35:00.7478470Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 28%] 2025-09-07T09:35:00.7478780Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 28%] 2025-09-07T09:35:00.7479084Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 28%] 2025-09-07T09:35:00.7479355Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 28%] 2025-09-07T09:35:00.7479628Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 28%] 2025-09-07T09:35:00.7479903Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 28%] 2025-09-07T09:35:00.7480172Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0037s] [ 28%] 2025-09-07T09:35:00.7480444Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0034s] [ 28%] 2025-09-07T09:35:00.7480742Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 28%] 2025-09-07T09:35:00.7481034Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 28%] 2025-09-07T09:35:00.7482280Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 28%] 2025-09-07T09:35:00.7482554Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 28%] 2025-09-07T09:35:00.7482834Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 28%] 2025-09-07T09:35:00.7483109Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 28%] 2025-09-07T09:35:00.7483381Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 28%] 2025-09-07T09:35:00.7483661Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 28%] 2025-09-07T09:35:00.7483965Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 28%] 2025-09-07T09:35:00.7484256Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 28%] 2025-09-07T09:35:00.7484530Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 28%] 2025-09-07T09:35:00.7484802Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 28%] 2025-09-07T09:35:00.7485074Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 28%] 2025-09-07T09:35:00.7485344Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 28%] 2025-09-07T09:35:00.7485612Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0037s] [ 28%] 2025-09-07T09:35:00.7485900Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0032s] [ 28%] 2025-09-07T09:35:00.7486187Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 28%] 2025-09-07T09:35:00.7486458Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 28%] 2025-09-07T09:35:00.7486793Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 28%] 2025-09-07T09:35:00.7487144Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7487497Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7487846Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7488222Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7489578Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7489928Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7490281Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7490631Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7490976Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7491365Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7491730Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7492074Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7492423Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7492772Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7493114Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7493470Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7493837Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7494188Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7494536Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7494884Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7495232Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7495595Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7495957Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7496308Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7497679Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7498028Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7498375Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7498753Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7499203Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7499553Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7499903Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7500249Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7500598Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7500962Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7501325Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7501672Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7502025Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7502376Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7502725Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7503086Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7503444Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7503791Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7504143Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7504487Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7505810Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7506176Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7506596Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7506942Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 28%] 2025-09-07T09:35:00.7507216Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0042s] [ 28%] 2025-09-07T09:35:00.7507490Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0021s] [ 28%] 2025-09-07T09:35:00.7507761Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 28%] 2025-09-07T09:35:00.7508031Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 28%] 2025-09-07T09:35:00.7508347Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0020s] [ 28%] 2025-09-07T09:35:00.7508638Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0020s] [ 28%] 2025-09-07T09:35:00.7508910Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0022s] [ 28%] 2025-09-07T09:35:00.7509180Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 28%] 2025-09-07T09:35:00.7509452Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 28%] 2025-09-07T09:35:00.7509722Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0020s] [ 28%] 2025-09-07T09:35:00.7509989Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0022s] [ 28%] 2025-09-07T09:35:00.7510278Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0022s] [ 28%] 2025-09-07T09:35:00.7510568Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0020s] [ 28%] 2025-09-07T09:35:00.7510837Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0020s] [ 28%] 2025-09-07T09:35:00.7511103Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0022s] [ 28%] 2025-09-07T09:35:00.7511370Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0022s] [ 28%] 2025-09-07T09:35:00.7512619Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0037s] [ 28%] 2025-09-07T09:35:00.7512895Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 28%] 2025-09-07T09:35:00.7513165Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 28%] 2025-09-07T09:35:00.7513435Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 28%] 2025-09-07T09:35:00.7513731Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0022s] [ 28%] 2025-09-07T09:35:00.7514021Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0022s] [ 28%] 2025-09-07T09:35:00.7514291Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 28%] 2025-09-07T09:35:00.7514563Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 28%] 2025-09-07T09:35:00.7514833Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0022s] [ 28%] 2025-09-07T09:35:00.7515102Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0022s] [ 28%] 2025-09-07T09:35:00.7515383Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0023s] [ 28%] 2025-09-07T09:35:00.7515663Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0023s] [ 28%] 2025-09-07T09:35:00.7515935Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0022s] [ 28%] 2025-09-07T09:35:00.7516205Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0022s] [ 28%] 2025-09-07T09:35:00.7516473Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0023s] [ 28%] 2025-09-07T09:35:00.7516816Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0023s] [ 28%] 2025-09-07T09:35:00.7517087Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0022s] [ 28%] 2025-09-07T09:35:00.7517362Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0021s] [ 28%] 2025-09-07T09:35:00.7517632Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 28%] 2025-09-07T09:35:00.7517925Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 28%] 2025-09-07T09:35:00.7519186Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0021s] [ 28%] 2025-09-07T09:35:00.7519460Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0021s] [ 28%] 2025-09-07T09:35:00.7519735Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 28%] 2025-09-07T09:35:00.7520008Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 28%] 2025-09-07T09:35:00.7520277Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0022s] [ 28%] 2025-09-07T09:35:00.7520572Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0022s] [ 28%] 2025-09-07T09:35:00.7520861Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0023s] [ 28%] 2025-09-07T09:35:00.7521129Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0023s] [ 28%] 2025-09-07T09:35:00.7521399Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0022s] [ 28%] 2025-09-07T09:35:00.7521670Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0022s] [ 29%] 2025-09-07T09:35:00.7521939Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0023s] [ 29%] 2025-09-07T09:35:00.7522210Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0023s] [ 29%] 2025-09-07T09:35:00.7522561Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7522928Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7523288Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7523632Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7523980Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7524328Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7524672Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7525033Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7525392Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7526778Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7527126Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7527469Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7527814Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7528190Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7528553Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7528894Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7529241Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7529591Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7529935Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7530303Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7530672Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7531020Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7531367Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7531714Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7532057Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7532416Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7532802Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7533143Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7533488Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7534815Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7535162Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7535523Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7535881Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7536230Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7536641Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7536987Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7537335Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7537684Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7538060Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7538426Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7538770Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7539177Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7539518Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7539884Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7540249Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7540593Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7540939Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7541286Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7541564Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0034s] [ 29%] 2025-09-07T09:35:00.7542825Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 29%] 2025-09-07T09:35:00.7543125Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 29%] 2025-09-07T09:35:00.7543418Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 29%] 2025-09-07T09:35:00.7543697Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 29%] 2025-09-07T09:35:00.7543977Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 29%] 2025-09-07T09:35:00.7544258Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 29%] 2025-09-07T09:35:00.7544534Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 29%] 2025-09-07T09:35:00.7544806Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 29%] 2025-09-07T09:35:00.7545106Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 29%] 2025-09-07T09:35:00.7545396Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 29%] 2025-09-07T09:35:00.7545668Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 29%] 2025-09-07T09:35:00.7545943Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 29%] 2025-09-07T09:35:00.7546222Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 29%] 2025-09-07T09:35:00.7546569Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 29%] 2025-09-07T09:35:00.7546841Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 29%] 2025-09-07T09:35:00.7547117Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 29%] 2025-09-07T09:35:00.7547429Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 29%] 2025-09-07T09:35:00.7547725Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 29%] 2025-09-07T09:35:00.7548001Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 29%] 2025-09-07T09:35:00.7548281Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 29%] 2025-09-07T09:35:00.7549541Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 29%] 2025-09-07T09:35:00.7549821Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 29%] 2025-09-07T09:35:00.7550095Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 29%] 2025-09-07T09:35:00.7550399Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 29%] 2025-09-07T09:35:00.7550691Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 29%] 2025-09-07T09:35:00.7550967Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0030s] [ 29%] 2025-09-07T09:35:00.7551239Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 29%] 2025-09-07T09:35:00.7551518Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 29%] 2025-09-07T09:35:00.7551798Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 29%] 2025-09-07T09:35:00.7552069Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 29%] 2025-09-07T09:35:00.7552342Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 29%] 2025-09-07T09:35:00.7552636Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 29%] 2025-09-07T09:35:00.7552925Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 29%] 2025-09-07T09:35:00.7553204Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 29%] 2025-09-07T09:35:00.7553479Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 29%] 2025-09-07T09:35:00.7553760Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 29%] 2025-09-07T09:35:00.7554038Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 29%] 2025-09-07T09:35:00.7554314Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 29%] 2025-09-07T09:35:00.7554602Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 29%] 2025-09-07T09:35:00.7554890Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 29%] 2025-09-07T09:35:00.7556125Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 29%] 2025-09-07T09:35:00.7556400Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 29%] 2025-09-07T09:35:00.7556741Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 29%] 2025-09-07T09:35:00.7557016Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 29%] 2025-09-07T09:35:00.7557293Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 29%] 2025-09-07T09:35:00.7557567Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 29%] 2025-09-07T09:35:00.7557876Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 29%] 2025-09-07T09:35:00.7558250Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7558604Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7558958Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7559307Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7559680Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7560047Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7560397Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7560748Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7561097Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7561446Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7561794Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7562172Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7562555Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7562904Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7564233Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7564582Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7564953Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7565318Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7565670Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7566021Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7566376Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7566786Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7567139Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7567522Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 29%] 2025-09-07T09:35:00.7567887Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7568238Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7568587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7568934Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7569307Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7569676Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7570023Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7570372Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7570724Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7571074Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7572405Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7572792Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7573161Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7573513Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7573864Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7574215Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7574577Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7574940Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7575289Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7575635Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7575985Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7576334Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7576747Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7577137Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7577433Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 30%] 2025-09-07T09:35:00.7577710Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 30%] 2025-09-07T09:35:00.7577986Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 30%] 2025-09-07T09:35:00.7578261Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 30%] 2025-09-07T09:35:00.7578539Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0022s] [ 30%] 2025-09-07T09:35:00.7578816Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0022s] [ 30%] 2025-09-07T09:35:00.7580179Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 30%] 2025-09-07T09:35:00.7580481Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 30%] 2025-09-07T09:35:00.7580756Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0022s] [ 30%] 2025-09-07T09:35:00.7581029Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0022s] [ 30%] 2025-09-07T09:35:00.7581304Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 30%] 2025-09-07T09:35:00.7581577Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 30%] 2025-09-07T09:35:00.7581850Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 30%] 2025-09-07T09:35:00.7582125Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0022s] [ 30%] 2025-09-07T09:35:00.7582423Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 30%] 2025-09-07T09:35:00.7582747Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 30%] 2025-09-07T09:35:00.7583029Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 30%] 2025-09-07T09:35:00.7583309Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 30%] 2025-09-07T09:35:00.7583587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 30%] 2025-09-07T09:35:00.7583860Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 30%] 2025-09-07T09:35:00.7584161Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 30%] 2025-09-07T09:35:00.7584452Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 30%] 2025-09-07T09:35:00.7584728Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 30%] 2025-09-07T09:35:00.7585003Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 30%] 2025-09-07T09:35:00.7585280Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 30%] 2025-09-07T09:35:00.7585554Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 30%] 2025-09-07T09:35:00.7586892Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 30%] 2025-09-07T09:35:00.7587167Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 30%] 2025-09-07T09:35:00.7587484Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 30%] 2025-09-07T09:35:00.7587787Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 30%] 2025-09-07T09:35:00.7588059Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 30%] 2025-09-07T09:35:00.7588332Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 30%] 2025-09-07T09:35:00.7588614Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 30%] 2025-09-07T09:35:00.7588891Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 30%] 2025-09-07T09:35:00.7589168Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 30%] 2025-09-07T09:35:00.7589470Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 30%] 2025-09-07T09:35:00.7589767Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 30%] 2025-09-07T09:35:00.7590045Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 30%] 2025-09-07T09:35:00.7590321Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 30%] 2025-09-07T09:35:00.7590597Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 30%] 2025-09-07T09:35:00.7590873Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 30%] 2025-09-07T09:35:00.7591147Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 30%] 2025-09-07T09:35:00.7591418Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 30%] 2025-09-07T09:35:00.7591707Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 30%] 2025-09-07T09:35:00.7591997Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 30%] 2025-09-07T09:35:00.7592273Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 30%] 2025-09-07T09:35:00.7593523Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 30%] 2025-09-07T09:35:00.7593804Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 30%] 2025-09-07T09:35:00.7594164Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7594518Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7594891Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7595254Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7595603Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7595955Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7596308Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7596715Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7597093Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7597474Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7597821Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7598167Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7598520Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7598870Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7599245Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7599615Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7599965Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7600317Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7600672Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7601999Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7602376Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7602745Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7603096Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7603447Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7603800Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7604149Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7604514Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7604873Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7605220Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7605569Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7605921Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7606270Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7606723Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7607097Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7607448Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7607799Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7608151Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7608505Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7608890Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7610255Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7610605Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7610956Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7611305Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7611652Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7612023Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7612388Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7612736Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7613086Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 30%] 2025-09-07T09:35:00.7613363Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 30%] 2025-09-07T09:35:00.7613637Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0022s] [ 30%] 2025-09-07T09:35:00.7613926Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 31%] 2025-09-07T09:35:00.7614212Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 31%] 2025-09-07T09:35:00.7614488Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0022s] [ 31%] 2025-09-07T09:35:00.7614762Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0022s] [ 31%] 2025-09-07T09:35:00.7615035Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 31%] 2025-09-07T09:35:00.7615308Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 31%] 2025-09-07T09:35:00.7615577Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0022s] [ 31%] 2025-09-07T09:35:00.7615844Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0022s] [ 31%] 2025-09-07T09:35:00.7616129Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0024s] [ 31%] 2025-09-07T09:35:00.7617452Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0024s] [ 31%] 2025-09-07T09:35:00.7617724Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0022s] [ 31%] 2025-09-07T09:35:00.7617997Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0032s] [ 31%] 2025-09-07T09:35:00.7618271Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 31%] 2025-09-07T09:35:00.7618545Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0024s] [ 31%] 2025-09-07T09:35:00.7618820Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 31%] 2025-09-07T09:35:00.7619189Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 31%] 2025-09-07T09:35:00.7619483Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 31%] 2025-09-07T09:35:00.7619762Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 31%] 2025-09-07T09:35:00.7620039Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 31%] 2025-09-07T09:35:00.7620317Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 31%] 2025-09-07T09:35:00.7620592Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 31%] 2025-09-07T09:35:00.7620864Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 31%] 2025-09-07T09:35:00.7621135Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 31%] 2025-09-07T09:35:00.7621422Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 31%] 2025-09-07T09:35:00.7621709Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 31%] 2025-09-07T09:35:00.7621980Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 31%] 2025-09-07T09:35:00.7622251Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 31%] 2025-09-07T09:35:00.7622524Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 31%] 2025-09-07T09:35:00.7622794Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 31%] 2025-09-07T09:35:00.7624038Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 31%] 2025-09-07T09:35:00.7624343Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 31%] 2025-09-07T09:35:00.7624633Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 31%] 2025-09-07T09:35:00.7624910Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 31%] 2025-09-07T09:35:00.7625184Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 31%] 2025-09-07T09:35:00.7625463Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 31%] 2025-09-07T09:35:00.7625740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 31%] 2025-09-07T09:35:00.7626014Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 31%] 2025-09-07T09:35:00.7626288Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 31%] 2025-09-07T09:35:00.7626668Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 31%] 2025-09-07T09:35:00.7626959Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 31%] 2025-09-07T09:35:00.7627229Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 31%] 2025-09-07T09:35:00.7627499Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 31%] 2025-09-07T09:35:00.7627772Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 31%] 2025-09-07T09:35:00.7628043Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 31%] 2025-09-07T09:35:00.7628314Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 31%] 2025-09-07T09:35:00.7628612Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 31%] 2025-09-07T09:35:00.7628981Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7629329Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7629680Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7631023Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7631376Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7631727Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7632115Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7632478Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7632828Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7633176Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7633521Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7633876Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7634233Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7634581Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7634925Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7635277Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7635626Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7635976Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7636338Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7636767Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7637118Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7637470Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7637819Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7639224Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7639588Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7639939Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7640284Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7640631Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7640979Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7641325Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7641693Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7642057Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7642407Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7642757Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7643110Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7643468Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7643831Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7644181Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7644529Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7644877Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7645225Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7645571Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7645930Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7647319Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7647669Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7648018Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7648366Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7648737Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 31%] 2025-09-07T09:35:00.7649043Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0040s] [ 31%] 2025-09-07T09:35:00.7649322Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0038s] [ 31%] 2025-09-07T09:35:00.7649600Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0112s] [ 31%] 2025-09-07T09:35:00.7649875Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0073s] [ 31%] 2025-09-07T09:35:00.7650156Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0038s] [ 31%] 2025-09-07T09:35:00.7650434Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0038s] [ 31%] 2025-09-07T09:35:00.7650713Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0110s] [ 31%] 2025-09-07T09:35:00.7651012Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0074s] [ 31%] 2025-09-07T09:35:00.7651304Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0038s] [ 31%] 2025-09-07T09:35:00.7651577Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0037s] [ 31%] 2025-09-07T09:35:00.7651850Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0109s] [ 31%] 2025-09-07T09:35:00.7652125Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0073s] [ 31%] 2025-09-07T09:35:00.7652402Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0038s] [ 31%] 2025-09-07T09:35:00.7652677Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0037s] [ 31%] 2025-09-07T09:35:00.7653002Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0109s] [ 31%] 2025-09-07T09:35:00.7653289Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0072s] [ 31%] 2025-09-07T09:35:00.7654533Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0046s] [ 31%] 2025-09-07T09:35:00.7654811Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0046s] [ 31%] 2025-09-07T09:35:00.7655094Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0126s] [ 31%] 2025-09-07T09:35:00.7655374Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0085s] [ 31%] 2025-09-07T09:35:00.7655654Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0046s] [ 31%] 2025-09-07T09:35:00.7655934Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0046s] [ 31%] 2025-09-07T09:35:00.7656240Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0126s] [ 31%] 2025-09-07T09:35:00.7656587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0085s] [ 31%] 2025-09-07T09:35:00.7656860Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0046s] [ 31%] 2025-09-07T09:35:00.7657137Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0046s] [ 31%] 2025-09-07T09:35:00.7657413Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0126s] [ 31%] 2025-09-07T09:35:00.7657690Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0084s] [ 31%] 2025-09-07T09:35:00.7657964Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0045s] [ 31%] 2025-09-07T09:35:00.7658268Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0045s] [ 32%] 2025-09-07T09:35:00.7658567Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0126s] [ 32%] 2025-09-07T09:35:00.7658845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0085s] [ 32%] 2025-09-07T09:35:00.7659163Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0046s] [ 32%] 2025-09-07T09:35:00.7659441Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0046s] [ 32%] 2025-09-07T09:35:00.7659719Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0126s] [ 32%] 2025-09-07T09:35:00.7659995Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0084s] [ 32%] 2025-09-07T09:35:00.7661251Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0046s] [ 32%] 2025-09-07T09:35:00.7661558Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0046s] [ 32%] 2025-09-07T09:35:00.7661858Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0126s] [ 32%] 2025-09-07T09:35:00.7662134Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0085s] [ 32%] 2025-09-07T09:35:00.7662409Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0046s] [ 32%] 2025-09-07T09:35:00.7662687Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0045s] [ 32%] 2025-09-07T09:35:00.7662962Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0128s] [ 32%] 2025-09-07T09:35:00.7663233Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0085s] [ 32%] 2025-09-07T09:35:00.7663526Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0046s] [ 32%] 2025-09-07T09:35:00.7663817Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0046s] [ 32%] 2025-09-07T09:35:00.7664095Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0126s] [ 32%] 2025-09-07T09:35:00.7664370Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0084s] [ 32%] 2025-09-07T09:35:00.7664728Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7665083Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7665436Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7665801Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7666169Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7666573Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7666924Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7667276Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7668627Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7668995Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7669347Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7669698Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7670048Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7670398Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7670749Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7671115Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7671489Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7671846Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7672201Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7672550Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7672917Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7673293Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7673645Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7674002Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7674356Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7674706Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7675056Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7675417Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7676810Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7677167Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7677523Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7677875Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7678262Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7678632Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7678984Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7679334Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7679698Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7680052Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7680405Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7680773Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7681140Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7681491Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7681841Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7682192Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7682554Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7682918Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7683267Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7683616Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7684870Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0041s] [ 32%] 2025-09-07T09:35:00.7685152Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0042s] [ 32%] 2025-09-07T09:35:00.7685431Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0126s] [ 32%] 2025-09-07T09:35:00.7685708Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0082s] [ 32%] 2025-09-07T09:35:00.7686010Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0043s] [ 32%] 2025-09-07T09:35:00.7686303Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0041s] [ 32%] 2025-09-07T09:35:00.7686639Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0123s] [ 32%] 2025-09-07T09:35:00.7686919Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0080s] [ 32%] 2025-09-07T09:35:00.7687193Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0040s] [ 32%] 2025-09-07T09:35:00.7687468Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0040s] [ 32%] 2025-09-07T09:35:00.7687781Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0124s] [ 32%] 2025-09-07T09:35:00.7688075Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0080s] [ 32%] 2025-09-07T09:35:00.7688351Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0040s] [ 32%] 2025-09-07T09:35:00.7688626Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0040s] [ 32%] 2025-09-07T09:35:00.7688900Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0122s] [ 32%] 2025-09-07T09:35:00.7689175Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0080s] [ 32%] 2025-09-07T09:35:00.7689453Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0046s] [ 32%] 2025-09-07T09:35:00.7689731Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0046s] [ 32%] 2025-09-07T09:35:00.7690029Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0134s] [ 32%] 2025-09-07T09:35:00.7690325Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0089s] [ 32%] 2025-09-07T09:35:00.7690603Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0047s] [ 32%] 2025-09-07T09:35:00.7691854Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0046s] [ 32%] 2025-09-07T09:35:00.7692134Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0134s] [ 32%] 2025-09-07T09:35:00.7692417Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0089s] [ 32%] 2025-09-07T09:35:00.7692695Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0046s] [ 32%] 2025-09-07T09:35:00.7692994Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0046s] [ 32%] 2025-09-07T09:35:00.7693287Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0138s] [ 32%] 2025-09-07T09:35:00.7693560Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0091s] [ 32%] 2025-09-07T09:35:00.7693836Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0045s] [ 32%] 2025-09-07T09:35:00.7694111Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0046s] [ 32%] 2025-09-07T09:35:00.7694389Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0139s] [ 32%] 2025-09-07T09:35:00.7694664Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0090s] [ 32%] 2025-09-07T09:35:00.7694944Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0046s] [ 32%] 2025-09-07T09:35:00.7695240Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0046s] [ 32%] 2025-09-07T09:35:00.7695533Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0135s] [ 32%] 2025-09-07T09:35:00.7695810Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0089s] [ 32%] 2025-09-07T09:35:00.7696090Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0046s] [ 32%] 2025-09-07T09:35:00.7696371Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0046s] [ 32%] 2025-09-07T09:35:00.7696725Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0135s] [ 32%] 2025-09-07T09:35:00.7697001Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0090s] [ 32%] 2025-09-07T09:35:00.7697305Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0046s] [ 32%] 2025-09-07T09:35:00.7698569Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0046s] [ 32%] 2025-09-07T09:35:00.7698846Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0139s] [ 32%] 2025-09-07T09:35:00.7699169Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0090s] [ 32%] 2025-09-07T09:35:00.7699447Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0046s] [ 32%] 2025-09-07T09:35:00.7699724Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0046s] [ 32%] 2025-09-07T09:35:00.7699999Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0138s] [ 32%] 2025-09-07T09:35:00.7700276Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0090s] [ 32%] 2025-09-07T09:35:00.7700662Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7701041Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7701394Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7701748Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7702102Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7702472Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7702842Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 32%] 2025-09-07T09:35:00.7703194Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7703544Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7703896Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7704243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7704591Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7704982Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7706314Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7706729Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7707081Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7707436Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7707813Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7708187Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7708541Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7708897Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7709254Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7709608Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7709962Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7710336Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7710710Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7711059Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7711407Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7711758Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7712126Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7712486Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7712836Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7713189Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7714519Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7714873Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7715225Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7715599Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7715969Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7716327Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7716760Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7717111Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7717491Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7717859Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7718208Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7718561Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7718911Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7719259Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7719608Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7719906Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 33%] 2025-09-07T09:35:00.7720197Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0022s] [ 33%] 2025-09-07T09:35:00.7720469Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0035s] [ 33%] 2025-09-07T09:35:00.7720746Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 33%] 2025-09-07T09:35:00.7721025Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0021s] [ 33%] 2025-09-07T09:35:00.7722282Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0021s] [ 33%] 2025-09-07T09:35:00.7722558Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0035s] [ 33%] 2025-09-07T09:35:00.7722850Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 33%] 2025-09-07T09:35:00.7723139Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0022s] [ 33%] 2025-09-07T09:35:00.7723409Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0021s] [ 33%] 2025-09-07T09:35:00.7723682Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0035s] [ 33%] 2025-09-07T09:35:00.7723953Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 33%] 2025-09-07T09:35:00.7724226Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0021s] [ 33%] 2025-09-07T09:35:00.7724499Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0021s] [ 33%] 2025-09-07T09:35:00.7724772Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0035s] [ 33%] 2025-09-07T09:35:00.7725071Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 33%] 2025-09-07T09:35:00.7725361Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 33%] 2025-09-07T09:35:00.7725636Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 33%] 2025-09-07T09:35:00.7725910Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0045s] [ 33%] 2025-09-07T09:35:00.7726187Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0035s] [ 33%] 2025-09-07T09:35:00.7726463Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 33%] 2025-09-07T09:35:00.7726830Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 33%] 2025-09-07T09:35:00.7727125Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0045s] [ 33%] 2025-09-07T09:35:00.7727402Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0035s] [ 33%] 2025-09-07T09:35:00.7727673Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 33%] 2025-09-07T09:35:00.7727944Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 33%] 2025-09-07T09:35:00.7729199Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0045s] [ 33%] 2025-09-07T09:35:00.7729474Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0035s] [ 33%] 2025-09-07T09:35:00.7729748Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 33%] 2025-09-07T09:35:00.7730022Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 33%] 2025-09-07T09:35:00.7730331Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0045s] [ 33%] 2025-09-07T09:35:00.7730621Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0035s] [ 33%] 2025-09-07T09:35:00.7730894Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 33%] 2025-09-07T09:35:00.7731170Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 33%] 2025-09-07T09:35:00.7731442Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0045s] [ 33%] 2025-09-07T09:35:00.7731718Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0036s] [ 33%] 2025-09-07T09:35:00.7732011Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 33%] 2025-09-07T09:35:00.7732301Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 33%] 2025-09-07T09:35:00.7732577Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0045s] [ 33%] 2025-09-07T09:35:00.7732851Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0036s] [ 33%] 2025-09-07T09:35:00.7733123Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 33%] 2025-09-07T09:35:00.7733395Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 33%] 2025-09-07T09:35:00.7733666Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0045s] [ 33%] 2025-09-07T09:35:00.7733943Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0035s] [ 33%] 2025-09-07T09:35:00.7734236Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 33%] 2025-09-07T09:35:00.7734523Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 33%] 2025-09-07T09:35:00.7735760Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0045s] [ 33%] 2025-09-07T09:35:00.7736033Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0035s] [ 33%] 2025-09-07T09:35:00.7736387Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7736798Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7737184Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7737551Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7737903Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7738253Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7738607Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7739008Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7739358Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7739729Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7740092Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7740438Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7740787Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7741135Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7741481Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7741846Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7742212Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7742563Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7743894Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7744246Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7744598Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7744973Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7745341Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7745690Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7746039Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7746386Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7746818Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7747188Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7747563Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7747912Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7748259Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7748606Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7748956Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 33%] 2025-09-07T09:35:00.7749330Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7749698Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7750046Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7750400Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7750751Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7752071Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7752460Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7752828Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7753175Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7753523Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7753868Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7754216Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7754586Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7754955Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7755304Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7755582Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.7755855Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 34%] 2025-09-07T09:35:00.7756129Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.7756416Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.7756777Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 34%] 2025-09-07T09:35:00.7757052Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.7757326Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 34%] 2025-09-07T09:35:00.7757599Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 34%] 2025-09-07T09:35:00.7757871Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 34%] 2025-09-07T09:35:00.7758142Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0022s] [ 34%] 2025-09-07T09:35:00.7759382Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 34%] 2025-09-07T09:35:00.7759654Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.7759962Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 34%] 2025-09-07T09:35:00.7760257Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0022s] [ 34%] 2025-09-07T09:35:00.7760529Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 34%] 2025-09-07T09:35:00.7760803Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 34%] 2025-09-07T09:35:00.7761078Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 34%] 2025-09-07T09:35:00.7761353Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 34%] 2025-09-07T09:35:00.7761666Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.7761957Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.7762236Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 34%] 2025-09-07T09:35:00.7762512Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 34%] 2025-09-07T09:35:00.7762789Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.7763065Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.7763337Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 34%] 2025-09-07T09:35:00.7763607Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 34%] 2025-09-07T09:35:00.7763895Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.7764179Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.7764452Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 34%] 2025-09-07T09:35:00.7764724Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 34%] 2025-09-07T09:35:00.7764997Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.7766242Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.7766584Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 34%] 2025-09-07T09:35:00.7766901Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 34%] 2025-09-07T09:35:00.7767229Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.7767506Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.7767783Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 34%] 2025-09-07T09:35:00.7768059Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 34%] 2025-09-07T09:35:00.7768335Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.7768608Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.7768879Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 34%] 2025-09-07T09:35:00.7769178Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 34%] 2025-09-07T09:35:00.7769467Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.7769737Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.7770009Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 34%] 2025-09-07T09:35:00.7770284Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 34%] 2025-09-07T09:35:00.7770557Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.7770827Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 34%] 2025-09-07T09:35:00.7771192Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7771556Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7771907Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7773229Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7773585Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7773940Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7774289Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7774658Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7775019Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7775370Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7775716Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7776062Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7776426Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7776848Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7777195Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7777542Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7777894Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7778245Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7778595Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7779038Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7779419Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7779770Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7780122Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7781454Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7781834Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7782201Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7782551Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7782898Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7783246Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7783595Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7783943Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7784308Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7784678Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7785028Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7785380Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7785728Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7786094Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7786463Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7786889Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7787239Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7787588Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7787935Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7788282Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7789644Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7790014Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7790365Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7790713Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7791062Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 34%] 2025-09-07T09:35:00.7791338Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 34%] 2025-09-07T09:35:00.7791635Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0021s] [ 34%] 2025-09-07T09:35:00.7791928Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 34%] 2025-09-07T09:35:00.7792200Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 34%] 2025-09-07T09:35:00.7792477Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0022s] [ 34%] 2025-09-07T09:35:00.7792752Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0021s] [ 34%] 2025-09-07T09:35:00.7793025Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0022s] [ 34%] 2025-09-07T09:35:00.7793298Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 34%] 2025-09-07T09:35:00.7793569Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0021s] [ 34%] 2025-09-07T09:35:00.7793854Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0022s] [ 34%] 2025-09-07T09:35:00.7794141Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0022s] [ 34%] 2025-09-07T09:35:00.7794410Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0023s] [ 34%] 2025-09-07T09:35:00.7794684Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0021s] [ 35%] 2025-09-07T09:35:00.7794958Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0021s] [ 35%] 2025-09-07T09:35:00.7795228Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0023s] [ 35%] 2025-09-07T09:35:00.7796462Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7796833Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7797140Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7797417Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7797690Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 35%] 2025-09-07T09:35:00.7797970Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 35%] 2025-09-07T09:35:00.7798247Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7798521Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 35%] 2025-09-07T09:35:00.7798796Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 35%] 2025-09-07T09:35:00.7799096Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 35%] 2025-09-07T09:35:00.7799386Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7799659Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0032s] [ 35%] 2025-09-07T09:35:00.7799933Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 35%] 2025-09-07T09:35:00.7800206Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 35%] 2025-09-07T09:35:00.7800478Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 35%] 2025-09-07T09:35:00.7800747Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 35%] 2025-09-07T09:35:00.7801037Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 35%] 2025-09-07T09:35:00.7801327Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 35%] 2025-09-07T09:35:00.7801601Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 35%] 2025-09-07T09:35:00.7801880Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 35%] 2025-09-07T09:35:00.7802153Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 35%] 2025-09-07T09:35:00.7803411Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7803688Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 35%] 2025-09-07T09:35:00.7803964Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 35%] 2025-09-07T09:35:00.7804275Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 35%] 2025-09-07T09:35:00.7804567Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 35%] 2025-09-07T09:35:00.7804841Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 35%] 2025-09-07T09:35:00.7805114Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0032s] [ 35%] 2025-09-07T09:35:00.7805386Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0032s] [ 35%] 2025-09-07T09:35:00.7805656Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0030s] [ 35%] 2025-09-07T09:35:00.7805945Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0029s] [ 35%] 2025-09-07T09:35:00.7806262Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0031s] [ 35%] 2025-09-07T09:35:00.7806602Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0032s] [ 35%] 2025-09-07T09:35:00.7806955Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7807311Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7807664Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7808010Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7808396Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7808764Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7809113Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7809464Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7810785Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7811133Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7811520Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7811884Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7812234Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7812581Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7812930Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0008s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7813275Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7813674Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7814040Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7814386Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7814735Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7815089Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7815441Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7815809Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7816176Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7816584Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7816931Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7817280Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7817630Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7818949Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7819402Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7819768Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7820114Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7820468Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7820821Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7821194Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7821567Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7821917Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7822268Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7822618Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7822970Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7823318Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7823680Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7824037Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7824382Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7824732Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7825081Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7825449Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7825808Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 35%] 2025-09-07T09:35:00.7827125Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 35%] 2025-09-07T09:35:00.7827397Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 35%] 2025-09-07T09:35:00.7827672Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 35%] 2025-09-07T09:35:00.7827943Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 35%] 2025-09-07T09:35:00.7828214Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 35%] 2025-09-07T09:35:00.7828488Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 35%] 2025-09-07T09:35:00.7828801Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 35%] 2025-09-07T09:35:00.7829092Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 35%] 2025-09-07T09:35:00.7829361Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 35%] 2025-09-07T09:35:00.7829629Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0022s] [ 35%] 2025-09-07T09:35:00.7829894Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7830160Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7830426Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0022s] [ 35%] 2025-09-07T09:35:00.7830732Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0022s] [ 35%] 2025-09-07T09:35:00.7831018Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 35%] 2025-09-07T09:35:00.7831285Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7831555Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7831827Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7832099Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 35%] 2025-09-07T09:35:00.7832365Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 35%] 2025-09-07T09:35:00.7833604Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7833898Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7834187Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 35%] 2025-09-07T09:35:00.7834457Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 35%] 2025-09-07T09:35:00.7834724Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7834996Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7835263Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 35%] 2025-09-07T09:35:00.7835530Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 35%] 2025-09-07T09:35:00.7835813Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7836098Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7836367Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 35%] 2025-09-07T09:35:00.7836702Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 35%] 2025-09-07T09:35:00.7836975Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7837248Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7837517Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 35%] 2025-09-07T09:35:00.7837786Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 35%] 2025-09-07T09:35:00.7838098Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7838385Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 35%] 2025-09-07T09:35:00.7838659Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 36%] 2025-09-07T09:35:00.7838931Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 36%] 2025-09-07T09:35:00.7839200Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 36%] 2025-09-07T09:35:00.7840435Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 36%] 2025-09-07T09:35:00.7840702Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 36%] 2025-09-07T09:35:00.7840994Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 36%] 2025-09-07T09:35:00.7841283Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 36%] 2025-09-07T09:35:00.7841557Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 36%] 2025-09-07T09:35:00.7841826Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 36%] 2025-09-07T09:35:00.7842095Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 36%] 2025-09-07T09:35:00.7842444Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7842791Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7843152Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7843512Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7843860Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7844212Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7844557Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7844902Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7845259Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7845615Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7845954Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7846296Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7846723Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7848046Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7848390Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7848764Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7849130Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7849480Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7849827Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7850171Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7850546Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7850910Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7851257Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7851604Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7851948Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7852290Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7852630Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7852986Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7853342Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7853688Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7854034Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7854377Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7854754Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7856077Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7856426Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7856843Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7857194Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7857546Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7857894Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7858280Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7858649Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7859052Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7859398Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7859740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7860101Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7860463Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7860804Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7861148Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7861431Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0033s] [ 36%] 2025-09-07T09:35:00.7861709Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 36%] 2025-09-07T09:35:00.7861985Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0044s] [ 36%] 2025-09-07T09:35:00.7862260Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0034s] [ 36%] 2025-09-07T09:35:00.7862558Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 36%] 2025-09-07T09:35:00.7863829Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 36%] 2025-09-07T09:35:00.7864108Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0043s] [ 36%] 2025-09-07T09:35:00.7864388Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0035s] [ 36%] 2025-09-07T09:35:00.7864663Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 36%] 2025-09-07T09:35:00.7864936Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 36%] 2025-09-07T09:35:00.7865210Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0044s] [ 36%] 2025-09-07T09:35:00.7865504Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0034s] [ 36%] 2025-09-07T09:35:00.7865792Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 36%] 2025-09-07T09:35:00.7866066Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 36%] 2025-09-07T09:35:00.7866338Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0043s] [ 36%] 2025-09-07T09:35:00.7866675Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0035s] [ 36%] 2025-09-07T09:35:00.7866953Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 36%] 2025-09-07T09:35:00.7867232Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 36%] 2025-09-07T09:35:00.7867509Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0049s] [ 36%] 2025-09-07T09:35:00.7867819Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0039s] [ 36%] 2025-09-07T09:35:00.7868116Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 36%] 2025-09-07T09:35:00.7868391Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 36%] 2025-09-07T09:35:00.7868669Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0049s] [ 36%] 2025-09-07T09:35:00.7868945Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0038s] [ 36%] 2025-09-07T09:35:00.7869216Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 36%] 2025-09-07T09:35:00.7870486Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 36%] 2025-09-07T09:35:00.7870786Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0049s] [ 36%] 2025-09-07T09:35:00.7871061Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0038s] [ 36%] 2025-09-07T09:35:00.7871335Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 36%] 2025-09-07T09:35:00.7871610Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0030s] [ 36%] 2025-09-07T09:35:00.7871886Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0049s] [ 36%] 2025-09-07T09:35:00.7872159Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0038s] [ 36%] 2025-09-07T09:35:00.7872437Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 36%] 2025-09-07T09:35:00.7872734Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 36%] 2025-09-07T09:35:00.7873027Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0050s] [ 36%] 2025-09-07T09:35:00.7873300Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0040s] [ 36%] 2025-09-07T09:35:00.7873577Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 36%] 2025-09-07T09:35:00.7873858Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 36%] 2025-09-07T09:35:00.7874132Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0049s] [ 36%] 2025-09-07T09:35:00.7874406Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0039s] [ 36%] 2025-09-07T09:35:00.7874695Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 36%] 2025-09-07T09:35:00.7875029Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0029s] [ 36%] 2025-09-07T09:35:00.7875301Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0050s] [ 36%] 2025-09-07T09:35:00.7875572Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0039s] [ 36%] 2025-09-07T09:35:00.7875847Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 36%] 2025-09-07T09:35:00.7876123Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0029s] [ 36%] 2025-09-07T09:35:00.7877430Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0049s] [ 36%] 2025-09-07T09:35:00.7877703Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0039s] [ 36%] 2025-09-07T09:35:00.7878108Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7878482Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7878835Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7879187Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7879540Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7879891Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7880259Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7880633Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7880981Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7881330Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7881676Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7882020Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7882385Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7882748Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7883097Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7883444Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7883795Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 36%] 2025-09-07T09:35:00.7884146Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7885486Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7885853Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7886212Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7886627Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7886979Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7887328Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7887705Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7888073Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7888426Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7888774Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7889125Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7889473Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7889840Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7890206Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7890558Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7890912Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7891262Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7891612Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7891979Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7892347Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7893696Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7894054Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7894406Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7894756Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7895121Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7895488Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7895837Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7896188Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7896593Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7896940Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7897260Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0032s] [ 37%] 2025-09-07T09:35:00.7897555Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 37%] 2025-09-07T09:35:00.7897832Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0046s] [ 37%] 2025-09-07T09:35:00.7898107Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0038s] [ 37%] 2025-09-07T09:35:00.7898387Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 37%] 2025-09-07T09:35:00.7898667Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 37%] 2025-09-07T09:35:00.7898942Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0046s] [ 37%] 2025-09-07T09:35:00.7899324Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0038s] [ 37%] 2025-09-07T09:35:00.7899614Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 37%] 2025-09-07T09:35:00.7899890Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 37%] 2025-09-07T09:35:00.7901212Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0045s] [ 37%] 2025-09-07T09:35:00.7901491Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0037s] [ 37%] 2025-09-07T09:35:00.7901768Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 37%] 2025-09-07T09:35:00.7902042Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 37%] 2025-09-07T09:35:00.7902315Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0046s] [ 37%] 2025-09-07T09:35:00.7902606Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0037s] [ 37%] 2025-09-07T09:35:00.7902902Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 37%] 2025-09-07T09:35:00.7903179Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 37%] 2025-09-07T09:35:00.7903454Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0050s] [ 37%] 2025-09-07T09:35:00.7903730Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0040s] [ 37%] 2025-09-07T09:35:00.7904011Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 37%] 2025-09-07T09:35:00.7904291Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 37%] 2025-09-07T09:35:00.7904587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0050s] [ 37%] 2025-09-07T09:35:00.7904878Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0040s] [ 37%] 2025-09-07T09:35:00.7905152Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 37%] 2025-09-07T09:35:00.7905428Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 37%] 2025-09-07T09:35:00.7905702Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0050s] [ 37%] 2025-09-07T09:35:00.7905976Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0039s] [ 37%] 2025-09-07T09:35:00.7906254Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 37%] 2025-09-07T09:35:00.7906593Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 37%] 2025-09-07T09:35:00.7907942Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0050s] [ 37%] 2025-09-07T09:35:00.7908241Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0040s] [ 37%] 2025-09-07T09:35:00.7908520Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 37%] 2025-09-07T09:35:00.7908798Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 37%] 2025-09-07T09:35:00.7909079Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0050s] [ 37%] 2025-09-07T09:35:00.7909356Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0040s] [ 37%] 2025-09-07T09:35:00.7909634Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 37%] 2025-09-07T09:35:00.7909934Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 37%] 2025-09-07T09:35:00.7910243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0049s] [ 37%] 2025-09-07T09:35:00.7910517Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0040s] [ 37%] 2025-09-07T09:35:00.7910790Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 37%] 2025-09-07T09:35:00.7911065Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 37%] 2025-09-07T09:35:00.7911338Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0049s] [ 37%] 2025-09-07T09:35:00.7911610Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0039s] [ 37%] 2025-09-07T09:35:00.7911889Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 37%] 2025-09-07T09:35:00.7912183Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 37%] 2025-09-07T09:35:00.7912469Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0050s] [ 37%] 2025-09-07T09:35:00.7912740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0040s] [ 37%] 2025-09-07T09:35:00.7913095Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7913449Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7913800Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7915143Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7915516Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7915870Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7916221Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7916629Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7916976Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7917358Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7917726Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7918073Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7918423Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7918773Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7919118Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7919504Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7919876Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7920228Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7920580Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7920929Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7921281Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7921648Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7922015Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7923345Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7923698Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7924050Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7924396Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7924774Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7925140Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7925489Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7925837Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7926185Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7926595Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7926979Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7927347Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7927699Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7928054Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7928407Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7928757Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7929128Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7929495Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7929848Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7930197Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 37%] 2025-09-07T09:35:00.7931521Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7931870Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7932241Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7932606Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7932956Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7933235Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 38%] 2025-09-07T09:35:00.7933512Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 38%] 2025-09-07T09:35:00.7933785Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 38%] 2025-09-07T09:35:00.7934060Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 38%] 2025-09-07T09:35:00.7934350Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 38%] 2025-09-07T09:35:00.7934639Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 38%] 2025-09-07T09:35:00.7934911Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 38%] 2025-09-07T09:35:00.7935187Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 38%] 2025-09-07T09:35:00.7935458Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 38%] 2025-09-07T09:35:00.7935729Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 38%] 2025-09-07T09:35:00.7935998Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 38%] 2025-09-07T09:35:00.7936265Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 38%] 2025-09-07T09:35:00.7936630Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 38%] 2025-09-07T09:35:00.7936923Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 38%] 2025-09-07T09:35:00.7937194Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 38%] 2025-09-07T09:35:00.7938441Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 38%] 2025-09-07T09:35:00.7938720Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 38%] 2025-09-07T09:35:00.7939086Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 38%] 2025-09-07T09:35:00.7939358Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 38%] 2025-09-07T09:35:00.7939662Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 38%] 2025-09-07T09:35:00.7939958Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 38%] 2025-09-07T09:35:00.7940234Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 38%] 2025-09-07T09:35:00.7940509Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 38%] 2025-09-07T09:35:00.7940786Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 38%] 2025-09-07T09:35:00.7941058Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 38%] 2025-09-07T09:35:00.7941328Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 38%] 2025-09-07T09:35:00.7941598Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 38%] 2025-09-07T09:35:00.7941882Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 38%] 2025-09-07T09:35:00.7942168Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 38%] 2025-09-07T09:35:00.7942440Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 38%] 2025-09-07T09:35:00.7942711Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 38%] 2025-09-07T09:35:00.7942985Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 38%] 2025-09-07T09:35:00.7943259Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 38%] 2025-09-07T09:35:00.7943531Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 38%] 2025-09-07T09:35:00.7943819Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 38%] 2025-09-07T09:35:00.7945081Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 38%] 2025-09-07T09:35:00.7945358Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 38%] 2025-09-07T09:35:00.7945636Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 38%] 2025-09-07T09:35:00.7945914Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 38%] 2025-09-07T09:35:00.7946191Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 38%] 2025-09-07T09:35:00.7946462Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 38%] 2025-09-07T09:35:00.7946804Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 38%] 2025-09-07T09:35:00.7947101Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 38%] 2025-09-07T09:35:00.7947391Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 38%] 2025-09-07T09:35:00.7947663Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 38%] 2025-09-07T09:35:00.7947934Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 38%] 2025-09-07T09:35:00.7948206Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 38%] 2025-09-07T09:35:00.7948479Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 38%] 2025-09-07T09:35:00.7948849Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7949219Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7949565Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7949912Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7950264Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7950617Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7950968Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7951382Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7952713Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7953059Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7953406Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7953752Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7954117Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7954479Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7954825Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7955170Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7955521Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7955870Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7956224Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7956646Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7957011Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7957362Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7957711Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7958060Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7958443Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7958825Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7959175Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7959519Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7960994Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7961341Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7961685Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7962053Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7962449Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7962798Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7963149Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7963496Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7963845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7964210Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7964582Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7964932Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7965279Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7965624Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7965968Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7966330Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7966770Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7967117Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7967464Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7967810Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 38%] 2025-09-07T09:35:00.7969079Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0040s] [ 38%] 2025-09-07T09:35:00.7969389Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0040s] [ 38%] 2025-09-07T09:35:00.7969688Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0104s] [ 38%] 2025-09-07T09:35:00.7969965Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0086s] [ 38%] 2025-09-07T09:35:00.7970244Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0039s] [ 38%] 2025-09-07T09:35:00.7970526Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0040s] [ 38%] 2025-09-07T09:35:00.7970803Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0103s] [ 38%] 2025-09-07T09:35:00.7971079Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0086s] [ 38%] 2025-09-07T09:35:00.7971353Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0039s] [ 38%] 2025-09-07T09:35:00.7971650Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0039s] [ 38%] 2025-09-07T09:35:00.7971943Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0103s] [ 38%] 2025-09-07T09:35:00.7972218Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0084s] [ 38%] 2025-09-07T09:35:00.7972495Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0039s] [ 38%] 2025-09-07T09:35:00.7972775Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0039s] [ 38%] 2025-09-07T09:35:00.7973051Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0102s] [ 38%] 2025-09-07T09:35:00.7973324Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0084s] [ 38%] 2025-09-07T09:35:00.7973620Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0043s] [ 38%] 2025-09-07T09:35:00.7973920Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0044s] [ 38%] 2025-09-07T09:35:00.7974199Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0112s] [ 38%] 2025-09-07T09:35:00.7974479Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0095s] [ 38%] 2025-09-07T09:35:00.7975727Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0044s] [ 38%] 2025-09-07T09:35:00.7976011Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0043s] [ 38%] 2025-09-07T09:35:00.7976288Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0112s] [ 39%] 2025-09-07T09:35:00.7976623Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0094s] [ 39%] 2025-09-07T09:35:00.7976932Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0044s] [ 39%] 2025-09-07T09:35:00.7977232Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0046s] [ 39%] 2025-09-07T09:35:00.7977509Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0111s] [ 39%] 2025-09-07T09:35:00.7977788Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0093s] [ 39%] 2025-09-07T09:35:00.7978066Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0043s] [ 39%] 2025-09-07T09:35:00.7978343Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0043s] [ 39%] 2025-09-07T09:35:00.7978647Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0112s] [ 39%] 2025-09-07T09:35:00.7978940Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0094s] [ 39%] 2025-09-07T09:35:00.7979275Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0044s] [ 39%] 2025-09-07T09:35:00.7979553Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0043s] [ 39%] 2025-09-07T09:35:00.7979832Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0111s] [ 39%] 2025-09-07T09:35:00.7980114Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0094s] [ 39%] 2025-09-07T09:35:00.7980393Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0043s] [ 39%] 2025-09-07T09:35:00.7980672Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0043s] [ 39%] 2025-09-07T09:35:00.7980967Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0111s] [ 39%] 2025-09-07T09:35:00.7981259Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0094s] [ 39%] 2025-09-07T09:35:00.7982509Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0043s] [ 39%] 2025-09-07T09:35:00.7982786Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0043s] [ 39%] 2025-09-07T09:35:00.7983066Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0111s] [ 39%] 2025-09-07T09:35:00.7983343Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0093s] [ 39%] 2025-09-07T09:35:00.7983619Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0043s] [ 39%] 2025-09-07T09:35:00.7983914Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0043s] [ 39%] 2025-09-07T09:35:00.7984207Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0111s] [ 39%] 2025-09-07T09:35:00.7984485Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0093s] [ 39%] 2025-09-07T09:35:00.7984839Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7985198Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7985554Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7985905Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7986284Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7986712Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7987064Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7987415Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7987769Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7988118Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7988500Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7988868Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7989219Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7990547Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7990903Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7991252Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7991628Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7992000Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7992353Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7992709Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7993065Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7993422Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7993796Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7994163Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7994514Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7994868Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7995219Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7995569Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7995935Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7996301Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7996719Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7997071Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7997424Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7998748Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7999142Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7999514Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.7999872Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.8000232Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.8000588Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.8000947Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.8001319Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.8001688Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.8002036Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.8002388Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.8002740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.8003105Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.8003471Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.8003824Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 39%] 2025-09-07T09:35:00.8004106Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0039s] [ 39%] 2025-09-07T09:35:00.8004387Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0038s] [ 39%] 2025-09-07T09:35:00.8004666Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0102s] [ 39%] 2025-09-07T09:35:00.8004943Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0078s] [ 39%] 2025-09-07T09:35:00.8005223Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0038s] [ 39%] 2025-09-07T09:35:00.8006545Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0038s] [ 39%] 2025-09-07T09:35:00.8006852Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0101s] [ 39%] 2025-09-07T09:35:00.8007127Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0080s] [ 39%] 2025-09-07T09:35:00.8007403Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0038s] [ 39%] 2025-09-07T09:35:00.8007681Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0038s] [ 39%] 2025-09-07T09:35:00.8007956Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0101s] [ 39%] 2025-09-07T09:35:00.8008227Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0078s] [ 39%] 2025-09-07T09:35:00.8008533Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0038s] [ 39%] 2025-09-07T09:35:00.8008832Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0038s] [ 39%] 2025-09-07T09:35:00.8009108Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0102s] [ 39%] 2025-09-07T09:35:00.8009384Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0078s] [ 39%] 2025-09-07T09:35:00.8009665Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0042s] [ 39%] 2025-09-07T09:35:00.8009945Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0042s] [ 39%] 2025-09-07T09:35:00.8010225Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0110s] [ 39%] 2025-09-07T09:35:00.8010505Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0085s] [ 39%] 2025-09-07T09:35:00.8010812Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0041s] [ 39%] 2025-09-07T09:35:00.8011144Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0042s] [ 39%] 2025-09-07T09:35:00.8011425Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0110s] [ 39%] 2025-09-07T09:35:00.8011702Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0085s] [ 39%] 2025-09-07T09:35:00.8011978Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0041s] [ 39%] 2025-09-07T09:35:00.8013241Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0041s] [ 39%] 2025-09-07T09:35:00.8013518Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0109s] [ 39%] 2025-09-07T09:35:00.8013809Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0083s] [ 39%] 2025-09-07T09:35:00.8014103Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0041s] [ 39%] 2025-09-07T09:35:00.8014381Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0041s] [ 39%] 2025-09-07T09:35:00.8014658Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0109s] [ 39%] 2025-09-07T09:35:00.8014936Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0085s] [ 39%] 2025-09-07T09:35:00.8015216Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0042s] [ 39%] 2025-09-07T09:35:00.8015494Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0041s] [ 39%] 2025-09-07T09:35:00.8015770Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0110s] [ 39%] 2025-09-07T09:35:00.8016065Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0085s] [ 39%] 2025-09-07T09:35:00.8016363Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0041s] [ 39%] 2025-09-07T09:35:00.8016711Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0041s] [ 39%] 2025-09-07T09:35:00.8016991Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0110s] [ 39%] 2025-09-07T09:35:00.8017269Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0085s] [ 39%] 2025-09-07T09:35:00.8017544Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0041s] [ 39%] 2025-09-07T09:35:00.8017842Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0041s] [ 39%] 2025-09-07T09:35:00.8018136Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0110s] [ 39%] 2025-09-07T09:35:00.8018412Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0084s] [ 39%] 2025-09-07T09:35:00.8018689Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0041s] [ 39%] 2025-09-07T09:35:00.8020019Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0042s] [ 39%] 2025-09-07T09:35:00.8020299Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0110s] [ 39%] 2025-09-07T09:35:00.8020576Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0084s] [ 39%] 2025-09-07T09:35:00.8020932Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8021321Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8021698Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8022050Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8022411Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8022765Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8023129Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8023495Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8023844Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8024195Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8024547Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8024898Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8025249Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8025613Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8025977Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8026325Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8026759Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8028090Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8028475Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8028846Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8029206Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8029562Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8029916Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8030270Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8030624Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8031000Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8031369Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8031720Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8032071Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8032427Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8032789Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8033153Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8033508Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8033861Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8034216Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8034570Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8034927Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8036280Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8036789Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8037143Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8037498Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8037853Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8038235Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8038634Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8038988Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8039340Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8039690Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8040041Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8040319Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 40%] 2025-09-07T09:35:00.8040629Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 40%] 2025-09-07T09:35:00.8040920Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0048s] [ 40%] 2025-09-07T09:35:00.8041195Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0040s] [ 40%] 2025-09-07T09:35:00.8041476Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 40%] 2025-09-07T09:35:00.8041754Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 40%] 2025-09-07T09:35:00.8042030Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0048s] [ 40%] 2025-09-07T09:35:00.8042304Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0040s] [ 40%] 2025-09-07T09:35:00.8042592Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 40%] 2025-09-07T09:35:00.8042878Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 40%] 2025-09-07T09:35:00.8044166Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0048s] [ 40%] 2025-09-07T09:35:00.8044437Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0039s] [ 40%] 2025-09-07T09:35:00.8044711Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 40%] 2025-09-07T09:35:00.8044986Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 40%] 2025-09-07T09:35:00.8045258Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0048s] [ 40%] 2025-09-07T09:35:00.8045529Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0039s] [ 40%] 2025-09-07T09:35:00.8045826Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 40%] 2025-09-07T09:35:00.8046121Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 40%] 2025-09-07T09:35:00.8046394Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0057s] [ 40%] 2025-09-07T09:35:00.8046743Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0045s] [ 40%] 2025-09-07T09:35:00.8047023Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 40%] 2025-09-07T09:35:00.8047302Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 40%] 2025-09-07T09:35:00.8047579Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0056s] [ 40%] 2025-09-07T09:35:00.8047884Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0046s] [ 40%] 2025-09-07T09:35:00.8048185Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 40%] 2025-09-07T09:35:00.8048458Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 40%] 2025-09-07T09:35:00.8048733Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0056s] [ 40%] 2025-09-07T09:35:00.8049008Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0045s] [ 40%] 2025-09-07T09:35:00.8049284Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 40%] 2025-09-07T09:35:00.8049558Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0029s] [ 40%] 2025-09-07T09:35:00.8050823Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0056s] [ 40%] 2025-09-07T09:35:00.8051134Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0046s] [ 40%] 2025-09-07T09:35:00.8051433Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0032s] [ 40%] 2025-09-07T09:35:00.8051707Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 40%] 2025-09-07T09:35:00.8051982Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0056s] [ 40%] 2025-09-07T09:35:00.8052259Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0044s] [ 40%] 2025-09-07T09:35:00.8052537Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 40%] 2025-09-07T09:35:00.8052814Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 40%] 2025-09-07T09:35:00.8053110Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0055s] [ 40%] 2025-09-07T09:35:00.8053405Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0044s] [ 40%] 2025-09-07T09:35:00.8053680Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 40%] 2025-09-07T09:35:00.8053951Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 40%] 2025-09-07T09:35:00.8054227Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0054s] [ 40%] 2025-09-07T09:35:00.8054500Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0044s] [ 40%] 2025-09-07T09:35:00.8054772Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 40%] 2025-09-07T09:35:00.8055046Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 40%] 2025-09-07T09:35:00.8055332Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0054s] [ 40%] 2025-09-07T09:35:00.8055617Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0044s] [ 40%] 2025-09-07T09:35:00.8055972Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8056324Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8057725Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8058076Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8058465Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8058849Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8059260Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8059615Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8059963Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8060310Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8060698Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8061064Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8061416Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8061768Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8062116Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8062462Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8062864Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8063231Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8063580Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8063933Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8064285Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8064636Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8065979Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8066352Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8066769Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8067124Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8067471Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 40%] 2025-09-07T09:35:00.8067817Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 41%] 2025-09-07T09:35:00.8068208Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 41%] 2025-09-07T09:35:00.8068577Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 41%] 2025-09-07T09:35:00.8068924Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 41%] 2025-09-07T09:35:00.8069275Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 41%] 2025-09-07T09:35:00.8069629Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 41%] 2025-09-07T09:35:00.8069980Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 41%] 2025-09-07T09:35:00.8070351Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 41%] 2025-09-07T09:35:00.8070717Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 41%] 2025-09-07T09:35:00.8071067Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 41%] 2025-09-07T09:35:00.8071422Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 41%] 2025-09-07T09:35:00.8071778Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 41%] 2025-09-07T09:35:00.8072133Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 41%] 2025-09-07T09:35:00.8072496Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 41%] 2025-09-07T09:35:00.8072856Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 41%] 2025-09-07T09:35:00.8074176Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 41%] 2025-09-07T09:35:00.8074524Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 41%] 2025-09-07T09:35:00.8074882Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 41%] 2025-09-07T09:35:00.8075234Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 41%] 2025-09-07T09:35:00.8075618Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 41%] 2025-09-07T09:35:00.8075981Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 41%] 2025-09-07T09:35:00.8076262Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.2606s] [ 41%] 2025-09-07T09:35:00.8076607Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0184s] [ 41%] 2025-09-07T09:35:00.8076889Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0548s] [ 41%] 2025-09-07T09:35:00.8077170Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0378s] [ 41%] 2025-09-07T09:35:00.8077452Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0182s] [ 41%] 2025-09-07T09:35:00.8077763Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0179s] [ 41%] 2025-09-07T09:35:00.8078059Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0547s] [ 41%] 2025-09-07T09:35:00.8078334Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0380s] [ 41%] 2025-09-07T09:35:00.8078609Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0180s] [ 41%] 2025-09-07T09:35:00.8078884Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0179s] [ 41%] 2025-09-07T09:35:00.8079160Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0829s] [ 41%] 2025-09-07T09:35:00.8079434Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.2207s] [ 41%] 2025-09-07T09:35:00.8079710Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.1339s] [ 41%] 2025-09-07T09:35:00.8080003Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0899s] [ 41%] 2025-09-07T09:35:00.8080297Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1093s] [ 41%] 2025-09-07T09:35:00.8081546Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.2397s] [ 41%] 2025-09-07T09:35:00.8081827Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.1390s] [ 41%] 2025-09-07T09:35:00.8082107Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1060s] [ 41%] 2025-09-07T09:35:00.8082388Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.2545s] [ 41%] 2025-09-07T09:35:00.8082664Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.3256s] [ 41%] 2025-09-07T09:35:00.8082968Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.1659s] [ 41%] 2025-09-07T09:35:00.8083267Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0239s] [ 41%] 2025-09-07T09:35:00.8083546Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1043s] [ 41%] 2025-09-07T09:35:00.8083824Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.5834s] [ 41%] 2025-09-07T09:35:00.8084100Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.3029s] [ 41%] 2025-09-07T09:35:00.8084376Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1935s] [ 41%] 2025-09-07T09:35:00.8084652Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.2186s] [ 41%] 2025-09-07T09:35:00.8084928Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.6058s] [ 41%] 2025-09-07T09:35:00.8085233Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.1832s] [ 41%] 2025-09-07T09:35:00.8085528Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.2238s] [ 41%] 2025-09-07T09:35:00.8085802Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.2459s] [ 41%] 2025-09-07T09:35:00.8086077Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.5355s] [ 41%] 2025-09-07T09:35:00.8086356Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.3569s] [ 41%] 2025-09-07T09:35:00.8086694Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.2293s] [ 41%] 2025-09-07T09:35:00.8086994Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1976s] [ 41%] 2025-09-07T09:35:00.8088262Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.5037s] [ 41%] 2025-09-07T09:35:00.8088549Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.3432s] [ 41%] 2025-09-07T09:35:00.8088831Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.2291s] [ 41%] 2025-09-07T09:35:00.8089108Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1979s] [ 41%] 2025-09-07T09:35:00.8089388Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.5030s] [ 41%] 2025-09-07T09:35:00.8089664Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.3441s] [ 41%] 2025-09-07T09:35:00.8089941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.2290s] [ 41%] 2025-09-07T09:35:00.8090248Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1983s] [ 41%] 2025-09-07T09:35:00.8090542Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.5033s] [ 41%] 2025-09-07T09:35:00.8090818Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.3776s] [ 41%] 2025-09-07T09:35:00.8091095Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1793s] [ 41%] 2025-09-07T09:35:00.8091371Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1974s] [ 41%] 2025-09-07T09:35:00.8091647Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.5510s] [ 41%] 2025-09-07T09:35:00.8091924Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.3410s] [ 41%] 2025-09-07T09:35:00.8092217Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1564s] [ 41%] 2025-09-07T09:35:00.8092507Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1231s] [ 41%] 2025-09-07T09:35:00.8092783Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.3153s] [ 41%] 2025-09-07T09:35:00.8093059Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.3014s] [ 41%] 2025-09-07T09:35:00.8093338Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1578s] [ 41%] 2025-09-07T09:35:00.8093614Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1569s] [ 41%] 2025-09-07T09:35:00.8094847Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.3998s] [ 41%] 2025-09-07T09:35:00.8095121Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.1771s] [ 41%] 2025-09-07T09:35:00.8095415Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1130s] [ 41%] 2025-09-07T09:35:00.8095708Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1999s] [ 41%] 2025-09-07T09:35:00.8095978Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.4785s] [ 41%] 2025-09-07T09:35:00.8096255Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.1416s] [ 41%] 2025-09-07T09:35:00.8096603Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1203s] [ 41%] 2025-09-07T09:35:00.8096879Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1233s] [ 41%] 2025-09-07T09:35:00.8097150Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.3937s] [ 41%] 2025-09-07T09:35:00.8097454Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.2687s] [ 41%] 2025-09-07T09:35:00.8097750Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.2195s] [ 41%] 2025-09-07T09:35:00.8098031Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.2088s] [ 41%] 2025-09-07T09:35:00.8098307Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.4899s] [ 41%] 2025-09-07T09:35:00.8098588Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.3612s] [ 41%] 2025-09-07T09:35:00.8098867Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.2224s] [ 41%] 2025-09-07T09:35:00.8099199Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.2281s] [ 41%] 2025-09-07T09:35:00.8099476Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.5770s] [ 41%] 2025-09-07T09:35:00.8099774Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.3064s] [ 41%] 2025-09-07T09:35:00.8100066Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1845s] [ 41%] 2025-09-07T09:35:00.8100343Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.2176s] [ 41%] 2025-09-07T09:35:00.8100616Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.5997s] [ 41%] 2025-09-07T09:35:00.8101881Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.2676s] [ 41%] 2025-09-07T09:35:00.8102159Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.2361s] [ 41%] 2025-09-07T09:35:00.8102435Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.2720s] [ 41%] 2025-09-07T09:35:00.8102729Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.5237s] [ 41%] 2025-09-07T09:35:00.8103020Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.3579s] [ 41%] 2025-09-07T09:35:00.8103297Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1494s] [ 41%] 2025-09-07T09:35:00.8103577Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1967s] [ 41%] 2025-09-07T09:35:00.8103856Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.5837s] [ 41%] 2025-09-07T09:35:00.8104135Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.3483s] [ 41%] 2025-09-07T09:35:00.8104412Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1481s] [ 41%] 2025-09-07T09:35:00.8104687Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1963s] [ 41%] 2025-09-07T09:35:00.8104985Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.5850s] [ 41%] 2025-09-07T09:35:00.8105273Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.3380s] [ 41%] 2025-09-07T09:35:00.8105547Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1530s] [ 41%] 2025-09-07T09:35:00.8105823Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1966s] [ 41%] 2025-09-07T09:35:00.8106096Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.5797s] [ 41%] 2025-09-07T09:35:00.8106371Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.2443s] [ 41%] 2025-09-07T09:35:00.8106728Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.2220s] [ 41%] 2025-09-07T09:35:00.8107020Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1973s] [ 41%] 2025-09-07T09:35:00.8107296Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.5098s] [ 41%] 2025-09-07T09:35:00.8108549Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.3448s] [ 41%] 2025-09-07T09:35:00.8108833Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1351s] [ 41%] 2025-09-07T09:35:00.8109118Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1357s] [ 41%] 2025-09-07T09:35:00.8109397Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.3296s] [ 41%] 2025-09-07T09:35:00.8109678Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.2375s] [ 41%] 2025-09-07T09:35:00.8109979Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1341s] [ 42%] 2025-09-07T09:35:00.8110276Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1806s] [ 42%] 2025-09-07T09:35:00.8110552Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.3927s] [ 42%] 2025-09-07T09:35:00.8110827Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.1907s] [ 42%] 2025-09-07T09:35:00.8111104Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0952s] [ 42%] 2025-09-07T09:35:00.8111379Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1246s] [ 42%] 2025-09-07T09:35:00.8111651Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.4813s] [ 42%] 2025-09-07T09:35:00.8111943Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.2608s] [ 42%] 2025-09-07T09:35:00.8112235Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0915s] [ 42%] 2025-09-07T09:35:00.8112513Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1476s] [ 42%] 2025-09-07T09:35:00.8112789Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.3253s] [ 42%] 2025-09-07T09:35:00.8113070Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.2758s] [ 42%] 2025-09-07T09:35:00.8113352Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1718s] [ 42%] 2025-09-07T09:35:00.8113629Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.2213s] [ 42%] 2025-09-07T09:35:00.8113905Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.4080s] [ 42%] 2025-09-07T09:35:00.8115177Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.3188s] [ 42%] 2025-09-07T09:35:00.8115485Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1815s] [ 42%] 2025-09-07T09:35:00.8115765Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1827s] [ 42%] 2025-09-07T09:35:00.8116043Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.5752s] [ 42%] 2025-09-07T09:35:00.8116323Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.3878s] [ 42%] 2025-09-07T09:35:00.8116675Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1538s] [ 42%] 2025-09-07T09:35:00.8116950Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.2293s] [ 42%] 2025-09-07T09:35:00.8117267Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.5794s] [ 42%] 2025-09-07T09:35:00.8117563Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.2802s] [ 42%] 2025-09-07T09:35:00.8117838Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1648s] [ 42%] 2025-09-07T09:35:00.8118112Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.2181s] [ 42%] 2025-09-07T09:35:00.8118390Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.5753s] [ 42%] 2025-09-07T09:35:00.8118670Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.3692s] [ 42%] 2025-09-07T09:35:00.8118950Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1641s] [ 42%] 2025-09-07T09:35:00.8119227Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.2287s] [ 42%] 2025-09-07T09:35:00.8119538Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.5738s] [ 42%] 2025-09-07T09:35:00.8119837Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.3440s] [ 42%] 2025-09-07T09:35:00.8120118Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1737s] [ 42%] 2025-09-07T09:35:00.8120398Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.2288s] [ 42%] 2025-09-07T09:35:00.8120677Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.5621s] [ 42%] 2025-09-07T09:35:00.8120955Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.3438s] [ 42%] 2025-09-07T09:35:00.8122216Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1738s] [ 42%] 2025-09-07T09:35:00.8122513Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.2292s] [ 42%] 2025-09-07T09:35:00.8122814Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.5602s] [ 42%] 2025-09-07T09:35:00.8123091Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.3455s] [ 42%] 2025-09-07T09:35:00.8123367Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1740s] [ 42%] 2025-09-07T09:35:00.8123643Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.2294s] [ 42%] 2025-09-07T09:35:00.8123917Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.5642s] [ 42%] 2025-09-07T09:35:00.8124195Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.3485s] [ 42%] 2025-09-07T09:35:00.8124474Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1487s] [ 42%] 2025-09-07T09:35:00.8124768Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1771s] [ 42%] 2025-09-07T09:35:00.8125058Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.4812s] [ 42%] 2025-09-07T09:35:00.8125335Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.2329s] [ 42%] 2025-09-07T09:35:00.8125614Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1333s] [ 42%] 2025-09-07T09:35:00.8125892Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1812s] [ 42%] 2025-09-07T09:35:00.8126167Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.4893s] [ 42%] 2025-09-07T09:35:00.8126456Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.2746s] [ 42%] 2025-09-07T09:35:00.8126806Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1163s] [ 42%] 2025-09-07T09:35:00.8127079Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1537s] [ 42%] 2025-09-07T09:35:00.8127352Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.5106s] [ 42%] 2025-09-07T09:35:00.8127628Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.1874s] [ 42%] 2025-09-07T09:35:00.8128924Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1173s] [ 42%] 2025-09-07T09:35:00.8129204Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1773s] [ 42%] 2025-09-07T09:35:00.8129477Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.4614s] [ 42%] 2025-09-07T09:35:00.8129793Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.2848s] [ 42%] 2025-09-07T09:35:00.8130097Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1807s] [ 42%] 2025-09-07T09:35:00.8130457Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1988s] [ 42%] 2025-09-07T09:35:00.8130756Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.5623s] [ 42%] 2025-09-07T09:35:00.8131048Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.3816s] [ 42%] 2025-09-07T09:35:00.8131390Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1817s] [ 42%] 2025-09-07T09:35:00.8131705Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1975s] [ 42%] 2025-09-07T09:35:00.8132037Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.5733s] [ 42%] 2025-09-07T09:35:00.8132331Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.3961s] [ 42%] 2025-09-07T09:35:00.8132603Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1620s] [ 42%] 2025-09-07T09:35:00.8132878Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.2257s] [ 42%] 2025-09-07T09:35:00.8133150Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.5686s] [ 42%] 2025-09-07T09:35:00.8133429Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.3774s] [ 42%] 2025-09-07T09:35:00.8133755Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1697s] [ 42%] 2025-09-07T09:35:00.8142047Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.2087s] [ 42%] 2025-09-07T09:35:00.8142467Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.6006s] [ 42%] 2025-09-07T09:35:00.8142773Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.3561s] [ 42%] 2025-09-07T09:35:00.8143066Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1855s] [ 42%] 2025-09-07T09:35:00.8143360Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.2259s] [ 42%] 2025-09-07T09:35:00.8143644Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.5523s] [ 42%] 2025-09-07T09:35:00.8143941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.3643s] [ 42%] 2025-09-07T09:35:00.8144224Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1816s] [ 42%] 2025-09-07T09:35:00.8144537Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.2255s] [ 42%] 2025-09-07T09:35:00.8144836Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.5409s] [ 42%] 2025-09-07T09:35:00.8145113Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.3653s] [ 42%] 2025-09-07T09:35:00.8145385Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1805s] [ 42%] 2025-09-07T09:35:00.8145661Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.2255s] [ 42%] 2025-09-07T09:35:00.8145934Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.5406s] [ 42%] 2025-09-07T09:35:00.8146208Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.2773s] [ 42%] 2025-09-07T09:35:00.8146553Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1824s] [ 42%] 2025-09-07T09:35:00.8146852Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.2260s] [ 42%] 2025-09-07T09:35:00.8147154Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.5474s] [ 42%] 2025-09-07T09:35:00.8149339Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.3397s] [ 42%] 2025-09-07T09:35:00.8149626Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1025s] [ 42%] 2025-09-07T09:35:00.8149909Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1195s] [ 42%] 2025-09-07T09:35:00.8150184Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.3285s] [ 42%] 2025-09-07T09:35:00.8150462Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.2182s] [ 42%] 2025-09-07T09:35:00.8150777Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0962s] [ 42%] 2025-09-07T09:35:00.8151074Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1003s] [ 42%] 2025-09-07T09:35:00.8151348Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.3173s] [ 42%] 2025-09-07T09:35:00.8151623Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.2283s] [ 42%] 2025-09-07T09:35:00.8151897Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1023s] [ 42%] 2025-09-07T09:35:00.8152166Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1189s] [ 42%] 2025-09-07T09:35:00.8152437Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.3189s] [ 42%] 2025-09-07T09:35:00.8152709Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.2027s] [ 42%] 2025-09-07T09:35:00.8153004Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0887s] [ 42%] 2025-09-07T09:35:00.8153292Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1075s] [ 42%] 2025-09-07T09:35:00.8153562Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.3432s] [ 42%] 2025-09-07T09:35:00.8153842Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.2313s] [ 42%] 2025-09-07T09:35:00.8154117Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1197s] [ 42%] 2025-09-07T09:35:00.8154392Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1503s] [ 42%] 2025-09-07T09:35:00.8154664Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.4539s] [ 42%] 2025-09-07T09:35:00.8154957Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.3129s] [ 42%] 2025-09-07T09:35:00.8155249Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1420s] [ 42%] 2025-09-07T09:35:00.8155527Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1668s] [ 42%] 2025-09-07T09:35:00.8155802Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.4590s] [ 42%] 2025-09-07T09:35:00.8156077Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.2865s] [ 42%] 2025-09-07T09:35:00.8156349Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1348s] [ 42%] 2025-09-07T09:35:00.8156705Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1663s] [ 42%] 2025-09-07T09:35:00.8156980Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.4728s] [ 42%] 2025-09-07T09:35:00.8157284Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.2489s] [ 42%] 2025-09-07T09:35:00.8157577Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1067s] [ 42%] 2025-09-07T09:35:00.8157851Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1666s] [ 42%] 2025-09-07T09:35:00.8158124Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.4794s] [ 43%] 2025-09-07T09:35:00.8158405Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.2843s] [ 43%] 2025-09-07T09:35:00.8158680Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1420s] [ 43%] 2025-09-07T09:35:00.8160255Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1571s] [ 43%] 2025-09-07T09:35:00.8160555Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.4603s] [ 43%] 2025-09-07T09:35:00.8160831Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.3090s] [ 43%] 2025-09-07T09:35:00.8161106Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1267s] [ 43%] 2025-09-07T09:35:00.8161382Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1316s] [ 43%] 2025-09-07T09:35:00.8161659Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.4939s] [ 43%] 2025-09-07T09:35:00.8161931Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.2907s] [ 43%] 2025-09-07T09:35:00.8162203Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1148s] [ 43%] 2025-09-07T09:35:00.8162494Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1584s] [ 43%] 2025-09-07T09:35:00.8162781Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.4884s] [ 43%] 2025-09-07T09:35:00.8163055Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.2831s] [ 43%] 2025-09-07T09:35:00.8163326Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1139s] [ 43%] 2025-09-07T09:35:00.8163600Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1668s] [ 43%] 2025-09-07T09:35:00.8163872Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.4894s] [ 43%] 2025-09-07T09:35:00.8164143Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.2885s] [ 43%] 2025-09-07T09:35:00.8164429Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0759s] [ 43%] 2025-09-07T09:35:00.8164715Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1206s] [ 43%] 2025-09-07T09:35:00.8164985Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.3531s] [ 43%] 2025-09-07T09:35:00.8165258Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.2133s] [ 43%] 2025-09-07T09:35:00.8165531Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1065s] [ 43%] 2025-09-07T09:35:00.8165802Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1025s] [ 43%] 2025-09-07T09:35:00.8167286Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.3300s] [ 43%] 2025-09-07T09:35:00.8167600Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.2161s] [ 43%] 2025-09-07T09:35:00.8167919Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0986s] [ 43%] 2025-09-07T09:35:00.8168222Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1207s] [ 43%] 2025-09-07T09:35:00.8168488Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.3277s] [ 43%] 2025-09-07T09:35:00.8168758Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.1301s] [ 43%] 2025-09-07T09:35:00.8169030Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1027s] [ 43%] 2025-09-07T09:35:00.8169302Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1203s] [ 43%] 2025-09-07T09:35:00.8169570Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.3413s] [ 43%] 2025-09-07T09:35:00.8169865Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.2212s] [ 43%] 2025-09-07T09:35:00.8170157Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1242s] [ 43%] 2025-09-07T09:35:00.8170430Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1407s] [ 43%] 2025-09-07T09:35:00.8170701Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.4892s] [ 43%] 2025-09-07T09:35:00.8170976Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.3008s] [ 43%] 2025-09-07T09:35:00.8171252Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1229s] [ 43%] 2025-09-07T09:35:00.8171524Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1404s] [ 43%] 2025-09-07T09:35:00.8171802Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.4874s] [ 43%] 2025-09-07T09:35:00.8172084Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.3073s] [ 43%] 2025-09-07T09:35:00.8172366Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1243s] [ 43%] 2025-09-07T09:35:00.8172638Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1406s] [ 43%] 2025-09-07T09:35:00.8174016Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.4881s] [ 43%] 2025-09-07T09:35:00.8174293Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.2902s] [ 43%] 2025-09-07T09:35:00.8174564Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1147s] [ 43%] 2025-09-07T09:35:00.8174834Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1669s] [ 43%] 2025-09-07T09:35:00.8175130Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.4653s] [ 43%] 2025-09-07T09:35:00.8175423Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.3139s] [ 43%] 2025-09-07T09:35:00.8175697Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1222s] [ 43%] 2025-09-07T09:35:00.8175967Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1686s] [ 43%] 2025-09-07T09:35:00.8176238Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.4553s] [ 43%] 2025-09-07T09:35:00.8176605Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.3048s] [ 43%] 2025-09-07T09:35:00.8176879Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.1277s] [ 43%] 2025-09-07T09:35:00.8177155Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1399s] [ 43%] 2025-09-07T09:35:00.8177471Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.4837s] [ 43%] 2025-09-07T09:35:00.8177759Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.2748s] [ 43%] 2025-09-07T09:35:00.8178027Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1421s] [ 43%] 2025-09-07T09:35:00.8178296Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1678s] [ 43%] 2025-09-07T09:35:00.8178571Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.4708s] [ 43%] 2025-09-07T09:35:00.8178845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.2076s] [ 43%] 2025-09-07T09:35:00.8179209Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.1368s] [ 43%] 2025-09-07T09:35:00.8179512Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1671s] [ 43%] 2025-09-07T09:35:00.8181866Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.4892s] [ 43%] 2025-09-07T09:35:00.8182162Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0047s] [ 43%] 2025-09-07T09:35:00.8182439Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0038s] [ 43%] 2025-09-07T09:35:00.8182714Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0062s] [ 43%] 2025-09-07T09:35:00.8182987Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0057s] [ 43%] 2025-09-07T09:35:00.8183261Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0035s] [ 43%] 2025-09-07T09:35:00.8183535Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0035s] [ 43%] 2025-09-07T09:35:00.8183836Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0062s] [ 43%] 2025-09-07T09:35:00.8184124Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0055s] [ 43%] 2025-09-07T09:35:00.8184394Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0036s] [ 43%] 2025-09-07T09:35:00.8184662Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0044s] [ 43%] 2025-09-07T09:35:00.8184934Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0062s] [ 43%] 2025-09-07T09:35:00.8185200Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0055s] [ 43%] 2025-09-07T09:35:00.8185469Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0035s] [ 43%] 2025-09-07T09:35:00.8185757Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0035s] [ 43%] 2025-09-07T09:35:00.8186039Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0062s] [ 43%] 2025-09-07T09:35:00.8186311Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0054s] [ 43%] 2025-09-07T09:35:00.8186731Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0037s] [ 43%] 2025-09-07T09:35:00.8187009Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0036s] [ 43%] 2025-09-07T09:35:00.8187282Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0063s] [ 43%] 2025-09-07T09:35:00.8188730Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0057s] [ 43%] 2025-09-07T09:35:00.8189011Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0036s] [ 43%] 2025-09-07T09:35:00.8189330Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0037s] [ 43%] 2025-09-07T09:35:00.8189625Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0063s] [ 43%] 2025-09-07T09:35:00.8189899Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0058s] [ 43%] 2025-09-07T09:35:00.8190171Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0036s] [ 43%] 2025-09-07T09:35:00.8190444Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0036s] [ 43%] 2025-09-07T09:35:00.8190714Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0063s] [ 43%] 2025-09-07T09:35:00.8190983Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0054s] [ 43%] 2025-09-07T09:35:00.8191272Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0036s] [ 43%] 2025-09-07T09:35:00.8191560Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0036s] [ 43%] 2025-09-07T09:35:00.8191827Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0063s] [ 43%] 2025-09-07T09:35:00.8192098Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0053s] [ 43%] 2025-09-07T09:35:00.8192371Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 43%] 2025-09-07T09:35:00.8192643Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 43%] 2025-09-07T09:35:00.8192915Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0061s] [ 43%] 2025-09-07T09:35:00.8193184Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0054s] [ 43%] 2025-09-07T09:35:00.8193475Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 43%] 2025-09-07T09:35:00.8193758Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 43%] 2025-09-07T09:35:00.8194031Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0061s] [ 43%] 2025-09-07T09:35:00.8195341Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0054s] [ 43%] 2025-09-07T09:35:00.8195619Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 43%] 2025-09-07T09:35:00.8195889Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0029s] [ 43%] 2025-09-07T09:35:00.8196176Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0061s] [ 43%] 2025-09-07T09:35:00.8196460Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0053s] [ 43%] 2025-09-07T09:35:00.8196790Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 43%] 2025-09-07T09:35:00.8197061Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0029s] [ 43%] 2025-09-07T09:35:00.8197331Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0061s] [ 43%] 2025-09-07T09:35:00.8197606Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0054s] [ 43%] 2025-09-07T09:35:00.8197963Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 43%] 2025-09-07T09:35:00.8198312Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 43%] 2025-09-07T09:35:00.8198681Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 43%] 2025-09-07T09:35:00.8199054Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 43%] 2025-09-07T09:35:00.8199404Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 43%] 2025-09-07T09:35:00.8199755Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 43%] 2025-09-07T09:35:00.8200102Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 43%] 2025-09-07T09:35:00.8200471Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 43%] 2025-09-07T09:35:00.8200832Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 43%] 2025-09-07T09:35:00.8201177Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 43%] 2025-09-07T09:35:00.8201519Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8202904Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8203258Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8203603Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8203970Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8204333Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8204681Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8205033Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8205383Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8205728Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8206093Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8206459Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8206979Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8207326Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8207674Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8208020Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8208401Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8208762Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8209107Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8209455Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8209799Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8211191Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8211568Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8211937Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8212286Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8212632Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8212981Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8213334Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8213706Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8214067Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8214415Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8214759Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8215102Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8215444Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8215811Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8216171Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8216579Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8216923Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8217198Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0037s] [ 44%] 2025-09-07T09:35:00.8217471Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 44%] 2025-09-07T09:35:00.8217746Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0060s] [ 44%] 2025-09-07T09:35:00.8219180Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0050s] [ 44%] 2025-09-07T09:35:00.8219479Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 44%] 2025-09-07T09:35:00.8219752Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 44%] 2025-09-07T09:35:00.8220027Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0059s] [ 44%] 2025-09-07T09:35:00.8220300Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0050s] [ 44%] 2025-09-07T09:35:00.8220568Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 44%] 2025-09-07T09:35:00.8220836Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0029s] [ 44%] 2025-09-07T09:35:00.8221127Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0059s] [ 44%] 2025-09-07T09:35:00.8221415Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0050s] [ 44%] 2025-09-07T09:35:00.8221684Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 44%] 2025-09-07T09:35:00.8221955Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0030s] [ 44%] 2025-09-07T09:35:00.8222226Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0059s] [ 44%] 2025-09-07T09:35:00.8222494Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0050s] [ 44%] 2025-09-07T09:35:00.8222765Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0033s] [ 44%] 2025-09-07T09:35:00.8223037Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0034s] [ 44%] 2025-09-07T09:35:00.8223335Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0062s] [ 44%] 2025-09-07T09:35:00.8223622Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0055s] [ 44%] 2025-09-07T09:35:00.8223897Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0032s] [ 44%] 2025-09-07T09:35:00.8224170Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0032s] [ 44%] 2025-09-07T09:35:00.8224444Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0063s] [ 44%] 2025-09-07T09:35:00.8224716Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0057s] [ 44%] 2025-09-07T09:35:00.8226032Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0031s] [ 44%] 2025-09-07T09:35:00.8226320Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0031s] [ 44%] 2025-09-07T09:35:00.8226645Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0061s] [ 44%] 2025-09-07T09:35:00.8226915Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0052s] [ 44%] 2025-09-07T09:35:00.8227185Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0030s] [ 44%] 2025-09-07T09:35:00.8227456Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0031s] [ 44%] 2025-09-07T09:35:00.8227724Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0060s] [ 44%] 2025-09-07T09:35:00.8227993Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0051s] [ 44%] 2025-09-07T09:35:00.8228267Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 44%] 2025-09-07T09:35:00.8228579Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 44%] 2025-09-07T09:35:00.8228887Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0063s] [ 44%] 2025-09-07T09:35:00.8229158Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0054s] [ 44%] 2025-09-07T09:35:00.8229436Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 44%] 2025-09-07T09:35:00.8229711Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 44%] 2025-09-07T09:35:00.8229983Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0062s] [ 44%] 2025-09-07T09:35:00.8230292Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0054s] [ 44%] 2025-09-07T09:35:00.8230579Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 44%] 2025-09-07T09:35:00.8230849Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 44%] 2025-09-07T09:35:00.8231122Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0060s] [ 44%] 2025-09-07T09:35:00.8231391Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0051s] [ 44%] 2025-09-07T09:35:00.8232680Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 44%] 2025-09-07T09:35:00.8232953Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 44%] 2025-09-07T09:35:00.8233221Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0060s] [ 44%] 2025-09-07T09:35:00.8233516Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0050s] [ 44%] 2025-09-07T09:35:00.8233887Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8234235Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8234587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8234931Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8235279Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8235650Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8236012Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8236361Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8236757Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8237102Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8237446Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8237829Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8238196Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8238539Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8238886Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8239232Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8240599Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8240988Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8241369Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8241715Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8242065Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8242414Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8242760Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8243107Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8243475Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8243842Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8244183Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8244533Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8244878Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8245237Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8245593Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8245937Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8246286Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8246691Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8247039Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8247384Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 44%] 2025-09-07T09:35:00.8248787Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8249156Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8249504Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8249853Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8250202Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8250564Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8250926Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8251272Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8251620Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8251969Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8252315Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8252661Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8252947Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0033s] [ 45%] 2025-09-07T09:35:00.8253231Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 45%] 2025-09-07T09:35:00.8253500Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8253767Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8254040Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 45%] 2025-09-07T09:35:00.8254312Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 45%] 2025-09-07T09:35:00.8254595Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8254876Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8256134Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 45%] 2025-09-07T09:35:00.8256401Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 45%] 2025-09-07T09:35:00.8256750Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8257020Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0024s] [ 45%] 2025-09-07T09:35:00.8257291Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 45%] 2025-09-07T09:35:00.8257561Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 45%] 2025-09-07T09:35:00.8257828Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0024s] [ 45%] 2025-09-07T09:35:00.8258124Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0024s] [ 45%] 2025-09-07T09:35:00.8258416Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 45%] 2025-09-07T09:35:00.8258686Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8259008Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8259278Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8259551Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 45%] 2025-09-07T09:35:00.8259844Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8260133Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8260404Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8260669Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 45%] 2025-09-07T09:35:00.8260935Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 45%] 2025-09-07T09:35:00.8261202Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8261467Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8261734Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8263036Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8263336Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 45%] 2025-09-07T09:35:00.8263619Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8263888Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8264160Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8264434Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8264702Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 45%] 2025-09-07T09:35:00.8264991Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 45%] 2025-09-07T09:35:00.8265276Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 45%] 2025-09-07T09:35:00.8265546Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 45%] 2025-09-07T09:35:00.8265815Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8266083Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8266353Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 45%] 2025-09-07T09:35:00.8266672Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 45%] 2025-09-07T09:35:00.8266936Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8267203Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8267500Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 45%] 2025-09-07T09:35:00.8267785Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8268052Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.8268402Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8269763Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8270111Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8270478Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8270848Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8271194Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8271538Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8271882Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8272221Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8272580Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8272939Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8273278Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8273622Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8273968Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8274308Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8274662Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8275020Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8275368Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8275713Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8276056Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8276403Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8277870Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8278237Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8278585Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8278930Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8279275Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8279613Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8279971Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8280350Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8280693Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8281039Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8281383Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8281730Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8282076Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8282436Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8282791Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8283144Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8283492Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8283837Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8284196Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8284555Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8285893Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8286237Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8286644Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8286988Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8287332Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8287708Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8288069Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 45%] 2025-09-07T09:35:00.8288350Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0061s] [ 45%] 2025-09-07T09:35:00.8288631Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0059s] [ 45%] 2025-09-07T09:35:00.8288910Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0185s] [ 45%] 2025-09-07T09:35:00.8289183Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0132s] [ 45%] 2025-09-07T09:35:00.8289487Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0058s] [ 45%] 2025-09-07T09:35:00.8289781Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0059s] [ 45%] 2025-09-07T09:35:00.8290057Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0183s] [ 45%] 2025-09-07T09:35:00.8290330Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0132s] [ 45%] 2025-09-07T09:35:00.8290604Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0058s] [ 45%] 2025-09-07T09:35:00.8290880Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0058s] [ 45%] 2025-09-07T09:35:00.8291151Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0565s] [ 45%] 2025-09-07T09:35:00.8291423Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0134s] [ 45%] 2025-09-07T09:35:00.8291720Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0414s] [ 45%] 2025-09-07T09:35:00.8293027Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0067s] [ 45%] 2025-09-07T09:35:00.8293302Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0962s] [ 45%] 2025-09-07T09:35:00.8293575Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0487s] [ 46%] 2025-09-07T09:35:00.8293856Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0420s] [ 46%] 2025-09-07T09:35:00.8294139Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0080s] [ 46%] 2025-09-07T09:35:00.8294418Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1229s] [ 46%] 2025-09-07T09:35:00.8294722Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0978s] [ 46%] 2025-09-07T09:35:00.8295021Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0281s] [ 46%] 2025-09-07T09:35:00.8295299Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0629s] [ 46%] 2025-09-07T09:35:00.8295574Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.1325s] [ 46%] 2025-09-07T09:35:00.8295850Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0609s] [ 46%] 2025-09-07T09:35:00.8296124Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0071s] [ 46%] 2025-09-07T09:35:00.8296399Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0539s] [ 46%] 2025-09-07T09:35:00.8296740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0897s] [ 46%] 2025-09-07T09:35:00.8297055Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0862s] [ 46%] 2025-09-07T09:35:00.8297348Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0070s] [ 46%] 2025-09-07T09:35:00.8297623Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0323s] [ 46%] 2025-09-07T09:35:00.8297901Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.1269s] [ 46%] 2025-09-07T09:35:00.8298177Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0158s] [ 46%] 2025-09-07T09:35:00.8298454Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0114s] [ 46%] 2025-09-07T09:35:00.8298733Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0070s] [ 46%] 2025-09-07T09:35:00.8300121Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0233s] [ 46%] 2025-09-07T09:35:00.8300424Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0175s] [ 46%] 2025-09-07T09:35:00.8300702Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0069s] [ 46%] 2025-09-07T09:35:00.8300980Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0069s] [ 46%] 2025-09-07T09:35:00.8301258Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0355s] [ 46%] 2025-09-07T09:35:00.8301535Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0290s] [ 46%] 2025-09-07T09:35:00.8301809Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0070s] [ 46%] 2025-09-07T09:35:00.8302087Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0070s] [ 46%] 2025-09-07T09:35:00.8302386Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0333s] [ 46%] 2025-09-07T09:35:00.8302676Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0289s] [ 46%] 2025-09-07T09:35:00.8302952Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0070s] [ 46%] 2025-09-07T09:35:00.8303229Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0069s] [ 46%] 2025-09-07T09:35:00.8303506Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0352s] [ 46%] 2025-09-07T09:35:00.8303778Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0186s] [ 46%] 2025-09-07T09:35:00.8304153Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8304521Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8304877Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8305230Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8305585Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8305937Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8307379Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8307777Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8308146Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8308493Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8308845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8309192Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8309580Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8309959Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8310307Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8310655Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8311010Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8311359Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8311708Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8312075Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8312441Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8312794Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8313145Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8313495Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8313862Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8314230Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8315589Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8315937Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8316288Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8316702Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8317051Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8317444Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8317827Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8318179Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8318530Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8318879Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8319273Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8319657Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8320010Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8320360Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8320709Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8321057Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8321410Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8321782Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8329090Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8329919Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8330676Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8331429Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 46%] 2025-09-07T09:35:00.8332146Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0067s] [ 46%] 2025-09-07T09:35:00.8335115Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0065s] [ 46%] 2025-09-07T09:35:00.8335744Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0203s] [ 46%] 2025-09-07T09:35:00.8336342Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0149s] [ 46%] 2025-09-07T09:35:00.8337033Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0064s] [ 46%] 2025-09-07T09:35:00.8337682Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0065s] [ 46%] 2025-09-07T09:35:00.8338273Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0203s] [ 46%] 2025-09-07T09:35:00.8339050Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0151s] [ 46%] 2025-09-07T09:35:00.8339707Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0068s] [ 46%] 2025-09-07T09:35:00.8340323Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0067s] [ 46%] 2025-09-07T09:35:00.8342270Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0215s] [ 46%] 2025-09-07T09:35:00.8342859Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0158s] [ 46%] 2025-09-07T09:35:00.8343458Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0068s] [ 46%] 2025-09-07T09:35:00.8344062Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0067s] [ 46%] 2025-09-07T09:35:00.8344656Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0213s] [ 46%] 2025-09-07T09:35:00.8345294Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0158s] [ 46%] 2025-09-07T09:35:00.8345909Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0075s] [ 46%] 2025-09-07T09:35:00.8346579Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0075s] [ 46%] 2025-09-07T09:35:00.8347174Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0243s] [ 46%] 2025-09-07T09:35:00.8347769Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0166s] [ 46%] 2025-09-07T09:35:00.8348371Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0074s] [ 46%] 2025-09-07T09:35:00.8350139Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0074s] [ 46%] 2025-09-07T09:35:00.8350730Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0242s] [ 46%] 2025-09-07T09:35:00.8351366Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0166s] [ 46%] 2025-09-07T09:35:00.8351965Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0074s] [ 46%] 2025-09-07T09:35:00.8352555Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0075s] [ 46%] 2025-09-07T09:35:00.8353146Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0242s] [ 46%] 2025-09-07T09:35:00.8353741Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0165s] [ 46%] 2025-09-07T09:35:00.8354329Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0074s] [ 46%] 2025-09-07T09:35:00.8354918Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0074s] [ 46%] 2025-09-07T09:35:00.8355536Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0241s] [ 46%] 2025-09-07T09:35:00.8358184Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0165s] [ 46%] 2025-09-07T09:35:00.8358787Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0075s] [ 46%] 2025-09-07T09:35:00.8359379Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0074s] [ 46%] 2025-09-07T09:35:00.8359971Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0240s] [ 46%] 2025-09-07T09:35:00.8360568Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0163s] [ 46%] 2025-09-07T09:35:00.8361157Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0074s] [ 46%] 2025-09-07T09:35:00.8361749Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0074s] [ 46%] 2025-09-07T09:35:00.8362387Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0238s] [ 46%] 2025-09-07T09:35:00.8362996Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0165s] [ 46%] 2025-09-07T09:35:00.8363586Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0074s] [ 46%] 2025-09-07T09:35:00.8365513Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0074s] [ 47%] 2025-09-07T09:35:00.8366102Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0241s] [ 47%] 2025-09-07T09:35:00.8366757Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0163s] [ 47%] 2025-09-07T09:35:00.8367366Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0073s] [ 47%] 2025-09-07T09:35:00.8367972Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0074s] [ 47%] 2025-09-07T09:35:00.8368557Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0240s] [ 47%] 2025-09-07T09:35:00.8369141Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0164s] [ 47%] 2025-09-07T09:35:00.8369810Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8370554Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8371293Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8374036Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8374842Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8375606Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8376350Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8377166Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8377906Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8378684Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8379509Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8380244Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8380981Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8381724Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8383655Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8384399Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8385195Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8385959Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8386780Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8387519Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8388264Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8389056Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8389820Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8390560Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8392394Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8393144Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8393881Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8394611Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8395392Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8396151Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8396957Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8397691Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8398425Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8399207Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8401108Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8401850Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8402597Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8403348Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8404091Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8404873Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8405647Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8406388Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8407208Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8407941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8408675Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8410509Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8411271Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8412005Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8412673Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0036s] [ 47%] 2025-09-07T09:35:00.8413259Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0035s] [ 47%] 2025-09-07T09:35:00.8413841Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0092s] [ 47%] 2025-09-07T09:35:00.8414420Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0065s] [ 47%] 2025-09-07T09:35:00.8415035Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0034s] [ 47%] 2025-09-07T09:35:00.8415648Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0034s] [ 47%] 2025-09-07T09:35:00.8416232Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0091s] [ 47%] 2025-09-07T09:35:00.8417930Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0064s] [ 47%] 2025-09-07T09:35:00.8418518Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0035s] [ 47%] 2025-09-07T09:35:00.8419178Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0034s] [ 47%] 2025-09-07T09:35:00.8419754Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0092s] [ 47%] 2025-09-07T09:35:00.8420370Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0068s] [ 47%] 2025-09-07T09:35:00.8420966Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0035s] [ 47%] 2025-09-07T09:35:00.8421547Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0035s] [ 47%] 2025-09-07T09:35:00.8422125Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0092s] [ 47%] 2025-09-07T09:35:00.8422703Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0067s] [ 47%] 2025-09-07T09:35:00.8423285Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0044s] [ 47%] 2025-09-07T09:35:00.8423871Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0044s] [ 47%] 2025-09-07T09:35:00.8425493Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0126s] [ 47%] 2025-09-07T09:35:00.8426103Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0087s] [ 47%] 2025-09-07T09:35:00.8426777Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0043s] [ 47%] 2025-09-07T09:35:00.8427364Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0043s] [ 47%] 2025-09-07T09:35:00.8427955Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0126s] [ 47%] 2025-09-07T09:35:00.8428549Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0087s] [ 47%] 2025-09-07T09:35:00.8429132Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0043s] [ 47%] 2025-09-07T09:35:00.8429711Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0043s] [ 47%] 2025-09-07T09:35:00.8430315Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0126s] [ 47%] 2025-09-07T09:35:00.8430911Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0086s] [ 47%] 2025-09-07T09:35:00.8432530Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0043s] [ 47%] 2025-09-07T09:35:00.8433114Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0043s] [ 47%] 2025-09-07T09:35:00.8433703Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0126s] [ 47%] 2025-09-07T09:35:00.8434286Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0087s] [ 47%] 2025-09-07T09:35:00.8434870Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0043s] [ 47%] 2025-09-07T09:35:00.8435458Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0043s] [ 47%] 2025-09-07T09:35:00.8436086Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0127s] [ 47%] 2025-09-07T09:35:00.8436752Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0087s] [ 47%] 2025-09-07T09:35:00.8437335Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0043s] [ 47%] 2025-09-07T09:35:00.8437923Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0043s] [ 47%] 2025-09-07T09:35:00.8439545Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0126s] [ 47%] 2025-09-07T09:35:00.8440138Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0088s] [ 47%] 2025-09-07T09:35:00.8440720Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0042s] [ 47%] 2025-09-07T09:35:00.8441330Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0042s] [ 47%] 2025-09-07T09:35:00.8441932Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0126s] [ 47%] 2025-09-07T09:35:00.8442509Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0087s] [ 47%] 2025-09-07T09:35:00.8443088Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0043s] [ 47%] 2025-09-07T09:35:00.8443668Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0043s] [ 47%] 2025-09-07T09:35:00.8444251Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0125s] [ 47%] 2025-09-07T09:35:00.8444828Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0086s] [ 47%] 2025-09-07T09:35:00.8445487Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8447359Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8448111Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8448847Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8449585Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8450323Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8451076Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8451828Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8452559Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8453289Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8454017Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8455773Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8456594Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8457358Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8458105Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8458829Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8459620Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8460359Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8461117Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8461867Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 47%] 2025-09-07T09:35:00.8462603Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8463343Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8465127Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8465867Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8466688Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8467464Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8468212Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8468939Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8469674Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8470413Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8471180Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8471930Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8473721Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8474463Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8475207Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8475940Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8476743Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8477531Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8478298Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8479034Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8479770Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8480503Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8482312Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8483065Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8483799Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8484537Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8485274Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8486003Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8486726Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0035s] [ 48%] 2025-09-07T09:35:00.8487344Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 48%] 2025-09-07T09:35:00.8487947Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 48%] 2025-09-07T09:35:00.8488525Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 48%] 2025-09-07T09:35:00.8489109Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 48%] 2025-09-07T09:35:00.8490782Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 48%] 2025-09-07T09:35:00.8491367Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 48%] 2025-09-07T09:35:00.8491945Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 48%] 2025-09-07T09:35:00.8492551Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 48%] 2025-09-07T09:35:00.8493144Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 48%] 2025-09-07T09:35:00.8493720Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:00.8494291Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:00.8494869Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 48%] 2025-09-07T09:35:00.8495447Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 48%] 2025-09-07T09:35:00.8496022Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:00.8497735Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:00.8498349Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:00.8499000Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:00.8499583Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 48%] 2025-09-07T09:35:00.8500163Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0031s] [ 48%] 2025-09-07T09:35:00.8500751Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:00.8501339Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:00.8501922Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 48%] 2025-09-07T09:35:00.8502533Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0031s] [ 48%] 2025-09-07T09:35:00.8503131Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 48%] 2025-09-07T09:35:00.8504897Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 48%] 2025-09-07T09:35:00.8505482Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 48%] 2025-09-07T09:35:00.8506059Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 48%] 2025-09-07T09:35:00.8506701Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 48%] 2025-09-07T09:35:00.8507279Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:00.8507854Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 48%] 2025-09-07T09:35:00.8508462Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 48%] 2025-09-07T09:35:00.8509059Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:00.8509641Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:00.8510223Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0030s] [ 48%] 2025-09-07T09:35:00.8510804Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 48%] 2025-09-07T09:35:00.8512435Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:00.8513023Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:00.8513642Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0031s] [ 48%] 2025-09-07T09:35:00.8514248Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0032s] [ 48%] 2025-09-07T09:35:00.8514828Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:00.8515405Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:00.8515982Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0030s] [ 48%] 2025-09-07T09:35:00.8516636Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 48%] 2025-09-07T09:35:00.8517214Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:00.8517793Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:00.8519463Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0029s] [ 48%] 2025-09-07T09:35:00.8520080Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 48%] 2025-09-07T09:35:00.8520736Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8521477Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8522209Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8522934Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8523704Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8524456Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8525187Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8525921Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8526730Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8529032Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8529818Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8530557Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8531278Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8532005Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8532732Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8533453Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8534204Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8534953Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8535681Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8537666Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8538406Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8539199Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8539932Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8540707Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8541457Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8542182Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8542914Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8543640Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8544396Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8546247Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8547060Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8547783Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8548510Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8549244Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8549975Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8550747Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8551497Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8552230Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8552962Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8554777Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8555533Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8556283Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8557074Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8557798Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8558524Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8559254Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 48%] 2025-09-07T09:35:00.8559979Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8560726Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8561398Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0031s] [ 49%] 2025-09-07T09:35:00.8561981Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8563652Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:00.8564235Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 49%] 2025-09-07T09:35:00.8564817Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 49%] 2025-09-07T09:35:00.8565425Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 49%] 2025-09-07T09:35:00.8566031Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:00.8566676Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:00.8567251Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8567823Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 49%] 2025-09-07T09:35:00.8568400Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:00.8568971Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 49%] 2025-09-07T09:35:00.8570620Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8571252Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8571849Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 49%] 2025-09-07T09:35:00.8572423Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:00.8573002Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:00.8573586Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 49%] 2025-09-07T09:35:00.8574177Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 49%] 2025-09-07T09:35:00.8574763Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 49%] 2025-09-07T09:35:00.8575366Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 49%] 2025-09-07T09:35:00.8575977Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:00.8577959Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 49%] 2025-09-07T09:35:00.8578544Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 49%] 2025-09-07T09:35:00.8579169Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:00.8579749Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:00.8580328Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0039s] [ 49%] 2025-09-07T09:35:00.8580901Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0037s] [ 49%] 2025-09-07T09:35:00.8581508Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:00.8582121Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8582699Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8583278Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:00.8585451Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8586055Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8586719Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:00.8587345Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:00.8587951Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 49%] 2025-09-07T09:35:00.8588545Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8589128Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8589710Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:00.8590289Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8590876Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8591462Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:00.8593835Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 49%] 2025-09-07T09:35:00.8594436Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8595021Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8595608Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8596187Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:00.8598042Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8598779Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8599555Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8600310Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8601047Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8601805Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8602553Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0008s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8603295Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8604083Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8604838Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8605573Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8606307Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8607110Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8607850Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8608609Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8611154Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8611882Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8612623Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8613357Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8614087Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8614859Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8615616Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8616354Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8617167Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8619536Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8620274Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8621046Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8621796Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8622525Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8623256Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8624001Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8624727Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8625480Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8627878Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8628636Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8629369Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8630103Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8630848Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8631621Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8632374Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8633108Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8633834Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8634568Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8635295Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8636074Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8636907Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8639163Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8639899Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 49%] 2025-09-07T09:35:00.8640559Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0032s] [ 49%] 2025-09-07T09:35:00.8641146Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 49%] 2025-09-07T09:35:00.8641725Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:00.8642334Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8642929Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 49%] 2025-09-07T09:35:00.8643513Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 49%] 2025-09-07T09:35:00.8644094Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 49%] 2025-09-07T09:35:00.8646715Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:00.8647314Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 49%] 2025-09-07T09:35:00.8647885Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 49%] 2025-09-07T09:35:00.8648459Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8649086Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0024s] [ 49%] 2025-09-07T09:35:00.8649693Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 49%] 2025-09-07T09:35:00.8650265Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 49%] 2025-09-07T09:35:00.8650838Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8651409Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8651992Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8654049Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8654669Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 49%] 2025-09-07T09:35:00.8655262Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:00.8655843Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 49%] 2025-09-07T09:35:00.8656426Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:00.8657085Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 49%] 2025-09-07T09:35:00.8657662Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 49%] 2025-09-07T09:35:00.8658236Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 50%] 2025-09-07T09:35:00.8658806Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 50%] 2025-09-07T09:35:00.8660889Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 50%] 2025-09-07T09:35:00.8661493Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 50%] 2025-09-07T09:35:00.8662065Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 50%] 2025-09-07T09:35:00.8662648Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 50%] 2025-09-07T09:35:00.8663223Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 50%] 2025-09-07T09:35:00.8663790Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 50%] 2025-09-07T09:35:00.8664361Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 50%] 2025-09-07T09:35:00.8664989Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 50%] 2025-09-07T09:35:00.8665596Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 50%] 2025-09-07T09:35:00.8666175Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 50%] 2025-09-07T09:35:00.8668223Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 50%] 2025-09-07T09:35:00.8668811Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 50%] 2025-09-07T09:35:00.8669397Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 50%] 2025-09-07T09:35:00.8669974Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 50%] 2025-09-07T09:35:00.8670548Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 50%] 2025-09-07T09:35:00.8671166Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 50%] 2025-09-07T09:35:00.8671755Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 50%] 2025-09-07T09:35:00.8672322Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 50%] 2025-09-07T09:35:00.8672896Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 50%] 2025-09-07T09:35:00.8674850Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 50%] 2025-09-07T09:35:00.8675434Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 50%] 2025-09-07T09:35:00.8676003Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 50%] 2025-09-07T09:35:00.8676764Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8677535Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8678261Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8678982Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8679705Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8680441Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8681209Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8683420Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8684146Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8684865Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8685582Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8686298Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8687127Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8687871Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8688600Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8689318Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8690043Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8690772Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8691528Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8693722Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8694466Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8695198Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8695929Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8696722Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8697491Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8698228Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8699014Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8699729Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8700447Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8702583Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8703310Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8704070Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8704816Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8705545Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8706273Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8707061Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8707815Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8708562Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8710751Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8711485Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8712210Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8712939Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8713659Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8714420Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8715169Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8715899Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8716702Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8718848Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8719567Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0041s] [ 50%] 2025-09-07T09:35:00.8720181Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0032s] [ 50%] 2025-09-07T09:35:00.8720762Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0086s] [ 50%] 2025-09-07T09:35:00.8721349Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0055s] [ 50%] 2025-09-07T09:35:00.8721936Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0031s] [ 50%] 2025-09-07T09:35:00.8722527Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0031s] [ 50%] 2025-09-07T09:35:00.8723113Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0085s] [ 50%] 2025-09-07T09:35:00.8723699Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0056s] [ 50%] 2025-09-07T09:35:00.8724298Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0032s] [ 50%] 2025-09-07T09:35:00.8724895Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0031s] [ 50%] 2025-09-07T09:35:00.8726925Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0085s] [ 50%] 2025-09-07T09:35:00.8727501Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0056s] [ 50%] 2025-09-07T09:35:00.8728080Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0030s] [ 50%] 2025-09-07T09:35:00.8728659Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0030s] [ 50%] 2025-09-07T09:35:00.8729237Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0085s] [ 50%] 2025-09-07T09:35:00.8729851Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0055s] [ 50%] 2025-09-07T09:35:00.8730450Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0033s] [ 50%] 2025-09-07T09:35:00.8730725Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0034s] [ 50%] 2025-09-07T09:35:00.8731001Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0091s] [ 50%] 2025-09-07T09:35:00.8731274Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0059s] [ 50%] 2025-09-07T09:35:00.8731552Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0033s] [ 50%] 2025-09-07T09:35:00.8731833Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0033s] [ 50%] 2025-09-07T09:35:00.8733467Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0091s] [ 50%] 2025-09-07T09:35:00.8733774Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0059s] [ 50%] 2025-09-07T09:35:00.8734063Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0033s] [ 50%] 2025-09-07T09:35:00.8734333Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0033s] [ 50%] 2025-09-07T09:35:00.8734606Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0090s] [ 50%] 2025-09-07T09:35:00.8734880Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0059s] [ 50%] 2025-09-07T09:35:00.8735154Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0033s] [ 50%] 2025-09-07T09:35:00.8735431Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0033s] [ 50%] 2025-09-07T09:35:00.8735721Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0091s] [ 50%] 2025-09-07T09:35:00.8736010Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0059s] [ 50%] 2025-09-07T09:35:00.8736285Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0033s] [ 50%] 2025-09-07T09:35:00.8736631Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0033s] [ 50%] 2025-09-07T09:35:00.8736904Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0092s] [ 50%] 2025-09-07T09:35:00.8737178Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0059s] [ 50%] 2025-09-07T09:35:00.8737454Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0033s] [ 50%] 2025-09-07T09:35:00.8737729Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0033s] [ 50%] 2025-09-07T09:35:00.8738043Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0091s] [ 50%] 2025-09-07T09:35:00.8738345Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0059s] [ 50%] 2025-09-07T09:35:00.8738614Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0033s] [ 50%] 2025-09-07T09:35:00.8740316Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0033s] [ 50%] 2025-09-07T09:35:00.8740598Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0090s] [ 50%] 2025-09-07T09:35:00.8740872Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0060s] [ 50%] 2025-09-07T09:35:00.8741144Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0033s] [ 50%] 2025-09-07T09:35:00.8741448Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0033s] [ 50%] 2025-09-07T09:35:00.8741739Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0090s] [ 50%] 2025-09-07T09:35:00.8742009Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0060s] [ 50%] 2025-09-07T09:35:00.8742363Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8742720Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8743070Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 50%] 2025-09-07T09:35:00.8743416Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8743786Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8744149Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8744496Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8744846Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8745194Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8745542Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8745914Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8747698Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8748057Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8748404Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8748750Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8749096Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8749476Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8749850Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8750207Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8750553Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8750908Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8751255Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8751630Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8751999Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8752344Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8752694Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8753040Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8753384Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8753750Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8755521Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8755884Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8756230Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8756648Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8756998Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8757382Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8757754Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8758104Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8758455Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8758802Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8759148Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8759511Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8759873Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8760219Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8760567Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8760918Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8761269Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8761630Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8763380Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8763674Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0031s] [ 51%] 2025-09-07T09:35:00.8763951Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 51%] 2025-09-07T09:35:00.8764229Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0094s] [ 51%] 2025-09-07T09:35:00.8764502Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0058s] [ 51%] 2025-09-07T09:35:00.8764783Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 51%] 2025-09-07T09:35:00.8765085Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 51%] 2025-09-07T09:35:00.8765384Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0093s] [ 51%] 2025-09-07T09:35:00.8765663Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0059s] [ 51%] 2025-09-07T09:35:00.8765938Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 51%] 2025-09-07T09:35:00.8766211Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0029s] [ 51%] 2025-09-07T09:35:00.8766480Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0093s] [ 51%] 2025-09-07T09:35:00.8766821Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0058s] [ 51%] 2025-09-07T09:35:00.8767130Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0029s] [ 51%] 2025-09-07T09:35:00.8767419Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0029s] [ 51%] 2025-09-07T09:35:00.8767690Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0093s] [ 51%] 2025-09-07T09:35:00.8767962Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0059s] [ 51%] 2025-09-07T09:35:00.8768241Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0033s] [ 51%] 2025-09-07T09:35:00.8768517Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0032s] [ 51%] 2025-09-07T09:35:00.8768794Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0099s] [ 51%] 2025-09-07T09:35:00.8769064Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0063s] [ 51%] 2025-09-07T09:35:00.8769360Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0032s] [ 51%] 2025-09-07T09:35:00.8769654Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0033s] [ 51%] 2025-09-07T09:35:00.8769930Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0100s] [ 51%] 2025-09-07T09:35:00.8770208Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0064s] [ 51%] 2025-09-07T09:35:00.8770483Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0032s] [ 51%] 2025-09-07T09:35:00.8770758Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0033s] [ 51%] 2025-09-07T09:35:00.8773777Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0099s] [ 51%] 2025-09-07T09:35:00.8774071Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0063s] [ 51%] 2025-09-07T09:35:00.8774357Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0032s] [ 51%] 2025-09-07T09:35:00.8774628Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0032s] [ 51%] 2025-09-07T09:35:00.8774898Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0099s] [ 51%] 2025-09-07T09:35:00.8775172Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0063s] [ 51%] 2025-09-07T09:35:00.8775448Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0033s] [ 51%] 2025-09-07T09:35:00.8775728Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0032s] [ 51%] 2025-09-07T09:35:00.8775999Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0100s] [ 51%] 2025-09-07T09:35:00.8776293Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0062s] [ 51%] 2025-09-07T09:35:00.8776646Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0033s] [ 51%] 2025-09-07T09:35:00.8776921Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0032s] [ 51%] 2025-09-07T09:35:00.8777197Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0099s] [ 51%] 2025-09-07T09:35:00.8777475Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0063s] [ 51%] 2025-09-07T09:35:00.8777751Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0032s] [ 51%] 2025-09-07T09:35:00.8778025Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0032s] [ 51%] 2025-09-07T09:35:00.8778326Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0100s] [ 51%] 2025-09-07T09:35:00.8778614Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0062s] [ 51%] 2025-09-07T09:35:00.8780612Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0032s] [ 51%] 2025-09-07T09:35:00.8780900Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0032s] [ 51%] 2025-09-07T09:35:00.8781172Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0099s] [ 51%] 2025-09-07T09:35:00.8781445Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0062s] [ 51%] 2025-09-07T09:35:00.8781799Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8782151Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8782531Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8782898Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8783254Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8783604Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8783956Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8784322Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8784683Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8785034Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8785381Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8785726Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8786079Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8786430Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8788277Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8788651Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8789003Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8789355Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8789706Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8790094Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8790472Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8790827Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8791175Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8791524Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8791871Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8792221Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8792586Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8792944Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8793295Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 51%] 2025-09-07T09:35:00.8793654Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8794000Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8794364Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8794731Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8796473Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8796887Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8797234Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8797587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8797937Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8798332Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8798719Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8799064Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8799412Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8799759Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8800143Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8800511Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8800861Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8801211Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8801556Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8801834Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0031s] [ 52%] 2025-09-07T09:35:00.8802107Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 52%] 2025-09-07T09:35:00.8802400Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 52%] 2025-09-07T09:35:00.8804038Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 52%] 2025-09-07T09:35:00.8804320Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0022s] [ 52%] 2025-09-07T09:35:00.8804595Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0022s] [ 52%] 2025-09-07T09:35:00.8804868Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 52%] 2025-09-07T09:35:00.8805138Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 52%] 2025-09-07T09:35:00.8805407Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0022s] [ 52%] 2025-09-07T09:35:00.8805699Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0022s] [ 52%] 2025-09-07T09:35:00.8805992Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0023s] [ 52%] 2025-09-07T09:35:00.8806260Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0023s] [ 52%] 2025-09-07T09:35:00.8806602Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0022s] [ 52%] 2025-09-07T09:35:00.8806873Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0022s] [ 52%] 2025-09-07T09:35:00.8807142Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0023s] [ 52%] 2025-09-07T09:35:00.8807413Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0024s] [ 52%] 2025-09-07T09:35:00.8807685Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 52%] 2025-09-07T09:35:00.8807993Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 52%] 2025-09-07T09:35:00.8808290Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 52%] 2025-09-07T09:35:00.8808562Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 52%] 2025-09-07T09:35:00.8808836Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 52%] 2025-09-07T09:35:00.8809110Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 52%] 2025-09-07T09:35:00.8810716Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 52%] 2025-09-07T09:35:00.8810997Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 52%] 2025-09-07T09:35:00.8811306Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 52%] 2025-09-07T09:35:00.8811591Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 52%] 2025-09-07T09:35:00.8811860Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0024s] [ 52%] 2025-09-07T09:35:00.8812128Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 52%] 2025-09-07T09:35:00.8812403Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 52%] 2025-09-07T09:35:00.8812672Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 52%] 2025-09-07T09:35:00.8812941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 52%] 2025-09-07T09:35:00.8813213Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 52%] 2025-09-07T09:35:00.8813503Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 52%] 2025-09-07T09:35:00.8813789Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 52%] 2025-09-07T09:35:00.8814058Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 52%] 2025-09-07T09:35:00.8814332Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 52%] 2025-09-07T09:35:00.8814607Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 52%] 2025-09-07T09:35:00.8814880Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 52%] 2025-09-07T09:35:00.8815150Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 52%] 2025-09-07T09:35:00.8815438Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 52%] 2025-09-07T09:35:00.8815718Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 52%] 2025-09-07T09:35:00.8817449Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 52%] 2025-09-07T09:35:00.8817730Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 52%] 2025-09-07T09:35:00.8818002Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 52%] 2025-09-07T09:35:00.8818273Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 52%] 2025-09-07T09:35:00.8818542Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 52%] 2025-09-07T09:35:00.8818810Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 52%] 2025-09-07T09:35:00.8819171Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 52%] 2025-09-07T09:35:00.8819540Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8819893Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8820243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8820591Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8820938Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8821305Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8821671Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8822019Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8822360Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8822706Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8823047Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8824791Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8825168Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8825510Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8825854Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8826202Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8826611Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8827007Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0007s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8827384Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8827731Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8828082Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8828434Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8828782Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8829160Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8829539Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8829881Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8830221Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8830564Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8830910Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8832723Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8833091Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8833440Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8833786Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8834134Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8834479Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8834824Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8835201Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8835569Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8835915Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8836264Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8836676Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8837054Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8837436Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8837780Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8838128Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8838474Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8838817Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8839161Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 52%] 2025-09-07T09:35:00.8839481Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0029s] [ 52%] 2025-09-07T09:35:00.8839784Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 52%] 2025-09-07T09:35:00.8841371Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 52%] 2025-09-07T09:35:00.8841651Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 52%] 2025-09-07T09:35:00.8841924Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 52%] 2025-09-07T09:35:00.8842195Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 52%] 2025-09-07T09:35:00.8842465Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 52%] 2025-09-07T09:35:00.8842760Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 52%] 2025-09-07T09:35:00.8843041Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 53%] 2025-09-07T09:35:00.8843312Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 53%] 2025-09-07T09:35:00.8843587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8843854Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8844120Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 53%] 2025-09-07T09:35:00.8844386Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 53%] 2025-09-07T09:35:00.8844652Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8844936Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8845231Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8845504Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8845780Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:00.8846048Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 53%] 2025-09-07T09:35:00.8846319Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8847972Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8848290Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:00.8848578Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:00.8848849Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8849117Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8849389Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8849653Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:00.8849923Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8850196Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8850494Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8850776Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:00.8851050Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8851321Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8851590Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:00.8851858Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 53%] 2025-09-07T09:35:00.8852128Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8852419Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8852704Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:00.8852974Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:00.8854531Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8854807Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8855073Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:00.8855337Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:00.8855606Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8855897Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8856178Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8856444Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 53%] 2025-09-07T09:35:00.8856797Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8857070Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8857337Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 53%] 2025-09-07T09:35:00.8857604Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:00.8857894Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8858184Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 53%] 2025-09-07T09:35:00.8858451Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8858716Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:00.8859067Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 53%] 2025-09-07T09:35:00.8859335Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 53%] 2025-09-07T09:35:00.8860929Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8861207Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:00.8861502Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 53%] 2025-09-07T09:35:00.8861785Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 53%] 2025-09-07T09:35:00.8862048Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8862314Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:00.8862584Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:00.8862855Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:00.8863122Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 53%] 2025-09-07T09:35:00.8863415Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 53%] 2025-09-07T09:35:00.8863700Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8863969Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8864238Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 53%] 2025-09-07T09:35:00.8864507Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 53%] 2025-09-07T09:35:00.8864775Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8865041Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8865306Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 53%] 2025-09-07T09:35:00.8865594Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 53%] 2025-09-07T09:35:00.8865882Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8867546Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8867820Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:00.8868088Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0028s] [ 53%] 2025-09-07T09:35:00.8868359Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8868629Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8868926Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 53%] 2025-09-07T09:35:00.8869210Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 53%] 2025-09-07T09:35:00.8869481Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8869751Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8870025Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:00.8870293Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0029s] [ 53%] 2025-09-07T09:35:00.8870560Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8870826Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8871113Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:00.8871394Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 53%] 2025-09-07T09:35:00.8871660Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8871926Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8872195Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0028s] [ 53%] 2025-09-07T09:35:00.8872461Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0029s] [ 53%] 2025-09-07T09:35:00.8874045Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 53%] 2025-09-07T09:35:00.8874344Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0021s] [ 53%] 2025-09-07T09:35:00.8874628Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 53%] 2025-09-07T09:35:00.8874894Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 53%] 2025-09-07T09:35:00.8875169Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0021s] [ 53%] 2025-09-07T09:35:00.8875441Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0021s] [ 53%] 2025-09-07T09:35:00.8875711Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 53%] 2025-09-07T09:35:00.8875979Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 53%] 2025-09-07T09:35:00.8876245Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0021s] [ 53%] 2025-09-07T09:35:00.8876619Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0021s] [ 53%] 2025-09-07T09:35:00.8876904Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0023s] [ 53%] 2025-09-07T09:35:00.8877167Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0023s] [ 53%] 2025-09-07T09:35:00.8877435Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0021s] [ 53%] 2025-09-07T09:35:00.8877704Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0021s] [ 53%] 2025-09-07T09:35:00.8877968Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8878233Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0023s] [ 53%] 2025-09-07T09:35:00.8878539Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 53%] 2025-09-07T09:35:00.8878846Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 53%] 2025-09-07T09:35:00.8880713Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8880993Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 53%] 2025-09-07T09:35:00.8881265Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 53%] 2025-09-07T09:35:00.8881536Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 53%] 2025-09-07T09:35:00.8881806Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8882079Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8882371Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 53%] 2025-09-07T09:35:00.8882655Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 53%] 2025-09-07T09:35:00.8882922Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8883191Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8883462Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 53%] 2025-09-07T09:35:00.8883729Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 53%] 2025-09-07T09:35:00.8883993Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:00.8884285Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:00.8884576Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 53%] 2025-09-07T09:35:00.8884848Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 53%] 2025-09-07T09:35:00.8885120Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 54%] 2025-09-07T09:35:00.8885390Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 54%] 2025-09-07T09:35:00.8885664Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8887335Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8887613Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 54%] 2025-09-07T09:35:00.8887925Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 54%] 2025-09-07T09:35:00.8888209Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8888476Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8888744Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 54%] 2025-09-07T09:35:00.8889010Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8889278Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8889551Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8889841Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8890126Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8890395Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0022s] [ 54%] 2025-09-07T09:35:00.8890666Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0022s] [ 54%] 2025-09-07T09:35:00.8890935Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8891201Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8891476Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8891744Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0022s] [ 54%] 2025-09-07T09:35:00.8892040Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8893722Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8893998Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8894268Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0022s] [ 54%] 2025-09-07T09:35:00.8894532Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8894799Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8895065Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0022s] [ 54%] 2025-09-07T09:35:00.8895359Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0022s] [ 54%] 2025-09-07T09:35:00.8895637Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8895905Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8896177Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8896448Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8896785Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 54%] 2025-09-07T09:35:00.8897053Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 54%] 2025-09-07T09:35:00.8897323Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8897593Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8897901Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 54%] 2025-09-07T09:35:00.8898186Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 54%] 2025-09-07T09:35:00.8898454Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8898726Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8899051Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 54%] 2025-09-07T09:35:00.8899317Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 54%] 2025-09-07T09:35:00.8901022Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8901319Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8901587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 54%] 2025-09-07T09:35:00.8901853Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 54%] 2025-09-07T09:35:00.8902121Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8902395Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8902663Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 54%] 2025-09-07T09:35:00.8902929Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 54%] 2025-09-07T09:35:00.8903198Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8903492Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8903776Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 54%] 2025-09-07T09:35:00.8904046Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 54%] 2025-09-07T09:35:00.8904313Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8904579Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8904845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 54%] 2025-09-07T09:35:00.8905111Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 54%] 2025-09-07T09:35:00.8905411Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8905696Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8905963Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 54%] 2025-09-07T09:35:00.8907898Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 54%] 2025-09-07T09:35:00.8908184Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8908454Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0021s] [ 54%] 2025-09-07T09:35:00.8908722Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8908988Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8909312Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0021s] [ 54%] 2025-09-07T09:35:00.8909622Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0021s] [ 54%] 2025-09-07T09:35:00.8909888Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8910155Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8910421Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0022s] [ 54%] 2025-09-07T09:35:00.8910689Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0021s] [ 54%] 2025-09-07T09:35:00.8910957Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8911267Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8911560Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0022s] [ 54%] 2025-09-07T09:35:00.8911825Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0021s] [ 54%] 2025-09-07T09:35:00.8912090Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8912356Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0022s] [ 54%] 2025-09-07T09:35:00.8912624Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8912894Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8913165Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8913453Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8915148Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8915428Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8915698Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 54%] 2025-09-07T09:35:00.8915965Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8916231Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8916565Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8916881Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8917170Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 54%] 2025-09-07T09:35:00.8917434Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8917699Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8917966Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 54%] 2025-09-07T09:35:00.8918232Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 54%] 2025-09-07T09:35:00.8918498Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8918766Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8919074Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 54%] 2025-09-07T09:35:00.8919372Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 54%] 2025-09-07T09:35:00.8919641Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8919910Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8920178Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 54%] 2025-09-07T09:35:00.8923028Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 54%] 2025-09-07T09:35:00.8923295Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8923587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8923871Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 54%] 2025-09-07T09:35:00.8924137Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 54%] 2025-09-07T09:35:00.8924403Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8924671Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:00.8924937Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 54%] 2025-09-07T09:35:00.8925199Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8925467Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0022s] [ 54%] 2025-09-07T09:35:00.8925758Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0022s] [ 54%] 2025-09-07T09:35:00.8926037Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8926301Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8926640Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0022s] [ 54%] 2025-09-07T09:35:00.8926909Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0022s] [ 54%] 2025-09-07T09:35:00.8927174Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8927437Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8927742Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0022s] [ 54%] 2025-09-07T09:35:00.8928025Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0022s] [ 54%] 2025-09-07T09:35:00.8929895Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8930170Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:00.8930437Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0022s] [ 54%] 2025-09-07T09:35:00.8930699Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0022s] [ 55%] 2025-09-07T09:35:00.8930961Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0024s] [ 55%] 2025-09-07T09:35:00.8931221Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0024s] [ 55%] 2025-09-07T09:35:00.8931488Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 55%] 2025-09-07T09:35:00.8931784Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 55%] 2025-09-07T09:35:00.8932068Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 55%] 2025-09-07T09:35:00.8932335Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 55%] 2025-09-07T09:35:00.8932610Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 55%] 2025-09-07T09:35:00.8932879Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 55%] 2025-09-07T09:35:00.8933144Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 55%] 2025-09-07T09:35:00.8933430Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 55%] 2025-09-07T09:35:00.8933707Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 55%] 2025-09-07T09:35:00.8933973Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 55%] 2025-09-07T09:35:00.8934236Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 55%] 2025-09-07T09:35:00.8934500Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 55%] 2025-09-07T09:35:00.8934766Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 55%] 2025-09-07T09:35:00.8937356Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 55%] 2025-09-07T09:35:00.8937619Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 55%] 2025-09-07T09:35:00.8937882Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 55%] 2025-09-07T09:35:00.8938185Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 55%] 2025-09-07T09:35:00.8938468Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 55%] 2025-09-07T09:35:00.8938736Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 55%] 2025-09-07T09:35:00.8939260Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 55%] 2025-09-07T09:35:00.8939529Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 55%] 2025-09-07T09:35:00.8939798Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 55%] 2025-09-07T09:35:00.8940068Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 55%] 2025-09-07T09:35:00.8940359Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 55%] 2025-09-07T09:35:00.8940639Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 55%] 2025-09-07T09:35:00.8940902Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 55%] 2025-09-07T09:35:00.8941167Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 55%] 2025-09-07T09:35:00.8941433Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 55%] 2025-09-07T09:35:00.8941697Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 55%] 2025-09-07T09:35:00.8941960Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 55%] 2025-09-07T09:35:00.8942223Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 55%] 2025-09-07T09:35:00.8942509Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0026s] [ 55%] 2025-09-07T09:35:00.8944445Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 55%] 2025-09-07T09:35:00.8944728Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 55%] 2025-09-07T09:35:00.8945003Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0034s] [ 55%] 2025-09-07T09:35:00.8945280Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0031s] [ 55%] 2025-09-07T09:35:00.8945555Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 55%] 2025-09-07T09:35:00.8945828Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 55%] 2025-09-07T09:35:00.8946126Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0033s] [ 55%] 2025-09-07T09:35:00.8946413Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0030s] [ 55%] 2025-09-07T09:35:00.8946748Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 55%] 2025-09-07T09:35:00.8947028Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 55%] 2025-09-07T09:35:00.8947301Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0035s] [ 55%] 2025-09-07T09:35:00.8947572Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 55%] 2025-09-07T09:35:00.8947841Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 55%] 2025-09-07T09:35:00.8948112Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 55%] 2025-09-07T09:35:00.8948419Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0033s] [ 55%] 2025-09-07T09:35:00.8948721Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 55%] 2025-09-07T09:35:00.8948995Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 55%] 2025-09-07T09:35:00.8949269Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 55%] 2025-09-07T09:35:00.8949545Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0035s] [ 55%] 2025-09-07T09:35:00.8951192Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0031s] [ 55%] 2025-09-07T09:35:00.8951482Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 55%] 2025-09-07T09:35:00.8951797Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 55%] 2025-09-07T09:35:00.8952088Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0034s] [ 55%] 2025-09-07T09:35:00.8952362Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0031s] [ 55%] 2025-09-07T09:35:00.8952634Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 55%] 2025-09-07T09:35:00.8952908Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 55%] 2025-09-07T09:35:00.8953178Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0035s] [ 55%] 2025-09-07T09:35:00.8953450Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0031s] [ 55%] 2025-09-07T09:35:00.8953725Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 55%] 2025-09-07T09:35:00.8954016Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0027s] [ 55%] 2025-09-07T09:35:00.8954299Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0036s] [ 55%] 2025-09-07T09:35:00.8954568Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0032s] [ 55%] 2025-09-07T09:35:00.8954842Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 55%] 2025-09-07T09:35:00.8955117Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 55%] 2025-09-07T09:35:00.8955391Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0035s] [ 55%] 2025-09-07T09:35:00.8955663Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0032s] [ 55%] 2025-09-07T09:35:00.8955960Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0028s] [ 55%] 2025-09-07T09:35:00.8956254Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 55%] 2025-09-07T09:35:00.8957912Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0035s] [ 55%] 2025-09-07T09:35:00.8958202Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0031s] [ 55%] 2025-09-07T09:35:00.8958474Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 55%] 2025-09-07T09:35:00.8958746Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0039s] [ 55%] 2025-09-07T09:35:00.8959016Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0035s] [ 55%] 2025-09-07T09:35:00.8959287Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0032s] [ 55%] 2025-09-07T09:35:00.8959602Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0028s] [ 55%] 2025-09-07T09:35:00.8959889Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0028s] [ 55%] 2025-09-07T09:35:00.8960162Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0035s] [ 55%] 2025-09-07T09:35:00.8960436Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0032s] [ 55%] 2025-09-07T09:35:00.8960793Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8961142Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8961513Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8961879Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8962230Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8962580Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8962934Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8963281Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8964936Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8965316Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8965679Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8966023Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8966368Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8966783Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8967172Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8967540Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8967890Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8968238Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8968586Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8968933Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8969282Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8969665Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8970035Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8970390Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8970735Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8971080Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8972790Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8973159Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8973509Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8973853Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8974200Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8974546Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8974893Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8975263Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8975633Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8975978Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8976329Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8976748Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8978573Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 55%] 2025-09-07T09:35:00.8978951Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.8979368Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.8979714Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.8980059Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.8980406Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.8982104Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.8982505Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.8982864Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.8983210Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.8983495Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0036s] [ 56%] 2025-09-07T09:35:00.8983772Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 56%] 2025-09-07T09:35:00.8984044Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0037s] [ 56%] 2025-09-07T09:35:00.8984334Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0032s] [ 56%] 2025-09-07T09:35:00.8984621Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 56%] 2025-09-07T09:35:00.8984897Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 56%] 2025-09-07T09:35:00.8985169Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0034s] [ 56%] 2025-09-07T09:35:00.8985443Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0031s] [ 56%] 2025-09-07T09:35:00.8985712Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 56%] 2025-09-07T09:35:00.8985988Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 56%] 2025-09-07T09:35:00.8986257Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0034s] [ 56%] 2025-09-07T09:35:00.8986620Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 56%] 2025-09-07T09:35:00.8986910Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0025s] [ 56%] 2025-09-07T09:35:00.8987182Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 56%] 2025-09-07T09:35:00.8988824Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0034s] [ 56%] 2025-09-07T09:35:00.8989108Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0030s] [ 56%] 2025-09-07T09:35:00.8989389Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 56%] 2025-09-07T09:35:00.8989662Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 56%] 2025-09-07T09:35:00.8989971Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0036s] [ 56%] 2025-09-07T09:35:00.8990275Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0032s] [ 56%] 2025-09-07T09:35:00.8990555Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 56%] 2025-09-07T09:35:00.8990831Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 56%] 2025-09-07T09:35:00.8991106Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0036s] [ 56%] 2025-09-07T09:35:00.8991385Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0031s] [ 56%] 2025-09-07T09:35:00.8991656Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 56%] 2025-09-07T09:35:00.8991928Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 56%] 2025-09-07T09:35:00.8992216Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0035s] [ 56%] 2025-09-07T09:35:00.8992498Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0032s] [ 56%] 2025-09-07T09:35:00.8992772Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 56%] 2025-09-07T09:35:00.8993045Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 56%] 2025-09-07T09:35:00.8993317Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0036s] [ 56%] 2025-09-07T09:35:00.8993591Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0031s] [ 56%] 2025-09-07T09:35:00.8993869Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 56%] 2025-09-07T09:35:00.8995470Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 56%] 2025-09-07T09:35:00.8995770Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0036s] [ 56%] 2025-09-07T09:35:00.8996044Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0031s] [ 56%] 2025-09-07T09:35:00.8996319Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 56%] 2025-09-07T09:35:00.8996670Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 56%] 2025-09-07T09:35:00.8996946Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0036s] [ 56%] 2025-09-07T09:35:00.8997219Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0032s] [ 56%] 2025-09-07T09:35:00.8997491Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 56%] 2025-09-07T09:35:00.8997796Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 56%] 2025-09-07T09:35:00.8998091Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0036s] [ 56%] 2025-09-07T09:35:00.8998359Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0032s] [ 56%] 2025-09-07T09:35:00.8998632Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0026s] [ 56%] 2025-09-07T09:35:00.8998905Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0026s] [ 56%] 2025-09-07T09:35:00.8999177Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0036s] [ 56%] 2025-09-07T09:35:00.8999449Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0032s] [ 56%] 2025-09-07T09:35:00.8999823Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9000187Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0009s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9000538Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9002203Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9002572Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9002923Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9003291Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9003651Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9003997Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9004342Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9004689Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9005037Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9005405Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9005768Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9006114Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9006456Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9006868Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9007217Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9007601Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9007974Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9008324Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9009996Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9010354Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9010703Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9011097Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9011471Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9011816Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9012159Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9012509Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9012854Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9013219Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9013575Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9013923Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9014273Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9014619Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9014966Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9015335Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9015697Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9016046Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9017791Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9018148Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9018494Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9018882Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9019302Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9019649Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9019994Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9020342Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9020687Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 56%] 2025-09-07T09:35:00.9020988Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0035s] [ 56%] 2025-09-07T09:35:00.9021276Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0023s] [ 56%] 2025-09-07T09:35:00.9021550Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0027s] [ 56%] 2025-09-07T09:35:00.9021822Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0028s] [ 56%] 2025-09-07T09:35:00.9022094Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0023s] [ 56%] 2025-09-07T09:35:00.9022372Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0022s] [ 56%] 2025-09-07T09:35:00.9022642Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 56%] 2025-09-07T09:35:00.9022915Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 56%] 2025-09-07T09:35:00.9023200Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0022s] [ 56%] 2025-09-07T09:35:00.9023478Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0022s] [ 56%] 2025-09-07T09:35:00.9025080Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0024s] [ 56%] 2025-09-07T09:35:00.9025352Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0024s] [ 56%] 2025-09-07T09:35:00.9025622Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0022s] [ 56%] 2025-09-07T09:35:00.9025889Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0022s] [ 56%] 2025-09-07T09:35:00.9026154Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0024s] [ 56%] 2025-09-07T09:35:00.9026443Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 56%] 2025-09-07T09:35:00.9026798Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 56%] 2025-09-07T09:35:00.9027072Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 56%] 2025-09-07T09:35:00.9027341Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0026s] [ 57%] 2025-09-07T09:35:00.9027609Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 57%] 2025-09-07T09:35:00.9027885Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 57%] 2025-09-07T09:35:00.9028156Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 57%] 2025-09-07T09:35:00.9028424Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 57%] 2025-09-07T09:35:00.9028749Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0026s] [ 57%] 2025-09-07T09:35:00.9029041Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 57%] 2025-09-07T09:35:00.9029308Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0025s] [ 57%] 2025-09-07T09:35:00.9029580Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 57%] 2025-09-07T09:35:00.9029846Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 57%] 2025-09-07T09:35:00.9031631Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 57%] 2025-09-07T09:35:00.9031919Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 57%] 2025-09-07T09:35:00.9032222Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 57%] 2025-09-07T09:35:00.9032508Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 57%] 2025-09-07T09:35:00.9032780Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 57%] 2025-09-07T09:35:00.9033054Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 57%] 2025-09-07T09:35:00.9033325Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 57%] 2025-09-07T09:35:00.9033595Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0027s] [ 57%] 2025-09-07T09:35:00.9033866Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 PASSED [0.0024s] [ 57%] 2025-09-07T09:35:00.9034138Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 PASSED [0.0024s] [ 57%] 2025-09-07T09:35:00.9034437Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 PASSED [0.0025s] [ 57%] 2025-09-07T09:35:00.9034727Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 PASSED [0.0025s] [ 57%] 2025-09-07T09:35:00.9034994Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 57%] 2025-09-07T09:35:00.9035259Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0024s] [ 57%] 2025-09-07T09:35:00.9035531Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0025s] [ 57%] 2025-09-07T09:35:00.9035798Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0025s] [ 57%] 2025-09-07T09:35:00.9036067Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 PASSED [0.0024s] [ 57%] 2025-09-07T09:35:00.9036359Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 PASSED [0.0023s] [ 57%] 2025-09-07T09:35:00.9036708Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 PASSED [0.0026s] [ 57%] 2025-09-07T09:35:00.9038329Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 PASSED [0.0027s] [ 57%] 2025-09-07T09:35:00.9038686Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9040622Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9041530Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9041873Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0008s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9042222Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9042592Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9042936Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9043281Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9043640Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9043983Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9044362Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9044730Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9045074Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9045418Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9048616Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9049081Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9049430Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9049780Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9050129Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9050475Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9050825Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9051176Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9051563Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9051936Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9052278Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9052622Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9053001Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9053364Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9053711Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9054056Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9054397Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9054740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9055092Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9055439Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9055814Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9056178Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9056604Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9056951Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9057342Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9057714Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9058064Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9058409Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0004s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9058755Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9059161Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9059505Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9059853Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9062062Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9062444Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 57%] 2025-09-07T09:35:00.9062721Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16 PASSED [1.3429s] [ 57%] 2025-09-07T09:35:00.9062989Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0010s] [ 57%] 2025-09-07T09:35:00.9063263Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0953s] [ 57%] 2025-09-07T09:35:00.9063552Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0010s] [ 57%] 2025-09-07T09:35:00.9064099Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16 PASSED [0.1054s] [ 57%] 2025-09-07T09:35:00.9064369Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0005s] [ 57%] 2025-09-07T09:35:00.9064644Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0870s] [ 57%] 2025-09-07T09:35:00.9064918Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0005s] [ 57%] 2025-09-07T09:35:00.9065187Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0848s] [ 57%] 2025-09-07T09:35:00.9065455Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0005s] [ 57%] 2025-09-07T09:35:00.9065724Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0771s] [ 57%] 2025-09-07T09:35:00.9065995Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0005s] [ 57%] 2025-09-07T09:35:00.9066265Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0836s] [ 57%] 2025-09-07T09:35:00.9066656Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0005s] [ 57%] 2025-09-07T09:35:00.9066942Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0758s] [ 57%] 2025-09-07T09:35:00.9067212Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0007s] [ 57%] 2025-09-07T09:35:00.9068857Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0851s] [ 57%] 2025-09-07T09:35:00.9069137Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0006s] [ 57%] 2025-09-07T09:35:00.9069457Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0855s] [ 57%] 2025-09-07T09:35:00.9069759Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0005s] [ 57%] 2025-09-07T09:35:00.9070023Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0835s] [ 57%] 2025-09-07T09:35:00.9070288Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0005s] [ 57%] 2025-09-07T09:35:00.9070558Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0907s] [ 57%] 2025-09-07T09:35:00.9070827Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0010s] [ 57%] 2025-09-07T09:35:00.9071091Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0674s] [ 57%] 2025-09-07T09:35:00.9071363Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0005s] [ 57%] 2025-09-07T09:35:00.9071634Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0640s] [ 57%] 2025-09-07T09:35:00.9071903Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0005s] [ 57%] 2025-09-07T09:35:00.9072189Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0595s] [ 57%] 2025-09-07T09:35:00.9072478Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0005s] [ 57%] 2025-09-07T09:35:00.9072745Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0614s] [ 57%] 2025-09-07T09:35:00.9073012Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0006s] [ 57%] 2025-09-07T09:35:00.9073280Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0988s] [ 57%] 2025-09-07T09:35:00.9073562Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0005s] [ 57%] 2025-09-07T09:35:00.9073844Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0796s] [ 57%] 2025-09-07T09:35:00.9075455Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0005s] [ 57%] 2025-09-07T09:35:00.9075741Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0891s] [ 57%] 2025-09-07T09:35:00.9076011Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0009s] [ 57%] 2025-09-07T09:35:00.9076280Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0851s] [ 57%] 2025-09-07T09:35:00.9076630Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0005s] [ 57%] 2025-09-07T09:35:00.9076897Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0732s] [ 57%] 2025-09-07T09:35:00.9077164Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0004s] [ 57%] 2025-09-07T09:35:00.9077432Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0927s] [ 57%] 2025-09-07T09:35:00.9077697Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0009s] [ 57%] 2025-09-07T09:35:00.9078009Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0801s] [ 58%] 2025-09-07T09:35:00.9078306Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0008s] [ 58%] 2025-09-07T09:35:00.9078572Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0786s] [ 58%] 2025-09-07T09:35:00.9078839Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0004s] [ 58%] 2025-09-07T09:35:00.9079141Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0764s] [ 58%] 2025-09-07T09:35:00.9079430Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0004s] [ 58%] 2025-09-07T09:35:00.9079699Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0927s] [ 58%] 2025-09-07T09:35:00.9079964Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0005s] [ 58%] 2025-09-07T09:35:00.9080229Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0867s] [ 58%] 2025-09-07T09:35:00.9080495Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0005s] [ 58%] 2025-09-07T09:35:00.9083609Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0890s] [ 58%] 2025-09-07T09:35:00.9083883Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0010s] [ 58%] 2025-09-07T09:35:00.9084160Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0578s] [ 58%] 2025-09-07T09:35:00.9084440Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0005s] [ 58%] 2025-09-07T09:35:00.9084710Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0577s] [ 58%] 2025-09-07T09:35:00.9085012Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0010s] [ 58%] 2025-09-07T09:35:00.9085297Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0535s] [ 58%] 2025-09-07T09:35:00.9085572Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0005s] [ 58%] 2025-09-07T09:35:00.9085842Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0539s] [ 58%] 2025-09-07T09:35:00.9086121Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0005s] [ 58%] 2025-09-07T09:35:00.9086420Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0622s] [ 58%] 2025-09-07T09:35:00.9086767Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0007s] [ 58%] 2025-09-07T09:35:00.9087047Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0597s] [ 58%] 2025-09-07T09:35:00.9087321Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0005s] [ 58%] 2025-09-07T09:35:00.9087598Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0600s] [ 58%] 2025-09-07T09:35:00.9087868Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0004s] [ 58%] 2025-09-07T09:35:00.9088147Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0630s] [ 58%] 2025-09-07T09:35:00.9088426Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0009s] [ 58%] 2025-09-07T09:35:00.9088700Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0645s] [ 58%] 2025-09-07T09:35:00.9090531Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0005s] [ 58%] 2025-09-07T09:35:00.9090862Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0582s] [ 58%] 2025-09-07T09:35:00.9091157Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0005s] [ 58%] 2025-09-07T09:35:00.9091422Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0299s] [ 58%] 2025-09-07T09:35:00.9091687Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0008s] [ 58%] 2025-09-07T09:35:00.9091953Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0614s] [ 58%] 2025-09-07T09:35:00.9092260Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0007s] [ 58%] 2025-09-07T09:35:00.9092555Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0593s] [ 58%] 2025-09-07T09:35:00.9092817Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0005s] [ 58%] 2025-09-07T09:35:00.9093088Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0616s] [ 58%] 2025-09-07T09:35:00.9093352Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0005s] [ 58%] 2025-09-07T09:35:00.9093616Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0620s] [ 58%] 2025-09-07T09:35:00.9093877Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0005s] [ 58%] 2025-09-07T09:35:00.9094146Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0646s] [ 58%] 2025-09-07T09:35:00.9094411Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0007s] [ 58%] 2025-09-07T09:35:00.9094677Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0412s] [ 58%] 2025-09-07T09:35:00.9094969Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0005s] [ 58%] 2025-09-07T09:35:00.9095245Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0401s] [ 58%] 2025-09-07T09:35:00.9095508Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0005s] [ 58%] 2025-09-07T09:35:00.9098120Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0474s] [ 58%] 2025-09-07T09:35:00.9098388Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0005s] [ 58%] 2025-09-07T09:35:00.9098693Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0427s] [ 58%] 2025-09-07T09:35:00.9099050Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0005s] [ 58%] 2025-09-07T09:35:00.9099318Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0636s] [ 58%] 2025-09-07T09:35:00.9099587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0005s] [ 58%] 2025-09-07T09:35:00.9099858Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0521s] [ 58%] 2025-09-07T09:35:00.9100125Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0005s] [ 58%] 2025-09-07T09:35:00.9100390Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0497s] [ 58%] 2025-09-07T09:35:00.9100655Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0006s] [ 58%] 2025-09-07T09:35:00.9100924Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0662s] [ 58%] 2025-09-07T09:35:00.9101191Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0009s] [ 58%] 2025-09-07T09:35:00.9101458Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0548s] [ 58%] 2025-09-07T09:35:00.9101759Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0007s] [ 58%] 2025-09-07T09:35:00.9102042Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0559s] [ 58%] 2025-09-07T09:35:00.9102306Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0009s] [ 58%] 2025-09-07T09:35:00.9102570Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0534s] [ 58%] 2025-09-07T09:35:00.9102849Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0010s] [ 58%] 2025-09-07T09:35:00.9104763Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0578s] [ 58%] 2025-09-07T09:35:00.9105042Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0009s] [ 58%] 2025-09-07T09:35:00.9105308Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0613s] [ 58%] 2025-09-07T09:35:00.9105573Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0005s] [ 58%] 2025-09-07T09:35:00.9105837Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0516s] [ 58%] 2025-09-07T09:35:00.9106099Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0005s] [ 58%] 2025-09-07T09:35:00.9106362Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0625s] [ 58%] 2025-09-07T09:35:00.9106695Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0009s] [ 58%] 2025-09-07T09:35:00.9106959Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0665s] [ 58%] 2025-09-07T09:35:00.9107221Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0007s] [ 58%] 2025-09-07T09:35:00.9107520Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0407s] [ 58%] 2025-09-07T09:35:00.9107805Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0004s] [ 58%] 2025-09-07T09:35:00.9108068Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0350s] [ 58%] 2025-09-07T09:35:00.9108329Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0008s] [ 58%] 2025-09-07T09:35:00.9108593Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16 PASSED [0.0375s] [ 58%] 2025-09-07T09:35:00.9108877Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16 PASSED [0.0004s] [ 58%] 2025-09-07T09:35:00.9109157Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16 PASSED [0.0378s] [ 58%] 2025-09-07T09:35:00.9109417Z test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16 PASSED [0.0004s] [ 58%] 2025-09-07T09:35:00.9109616Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_different_dk_dv_cuda SKIPPED [0.0001s] (cuDNN Attention is not supported on this system) [ 58%] 2025-09-07T09:35:00.9111305Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.2555s] [ 58%] 2025-09-07T09:35:00.9111594Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0083s] [ 58%] 2025-09-07T09:35:00.9111869Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0078s] [ 58%] 2025-09-07T09:35:00.9112143Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0129s] [ 58%] 2025-09-07T09:35:00.9112417Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0166s] [ 58%] 2025-09-07T09:35:00.9112691Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0199s] [ 58%] 2025-09-07T09:35:00.9112989Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0299s] [ 58%] 2025-09-07T09:35:00.9113276Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0261s] [ 58%] 2025-09-07T09:35:00.9113551Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0302s] [ 58%] 2025-09-07T09:35:00.9113823Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0193s] [ 58%] 2025-09-07T09:35:00.9114097Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0253s] [ 58%] 2025-09-07T09:35:00.9114386Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0253s] [ 58%] 2025-09-07T09:35:00.9114674Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0182s] [ 58%] 2025-09-07T09:35:00.9114947Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0088s] [ 58%] 2025-09-07T09:35:00.9115220Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0094s] [ 58%] 2025-09-07T09:35:00.9115492Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0166s] [ 58%] 2025-09-07T09:35:00.9115768Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0200s] [ 58%] 2025-09-07T09:35:00.9116043Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0311s] [ 58%] 2025-09-07T09:35:00.9116318Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0329s] [ 58%] 2025-09-07T09:35:00.9117986Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0271s] [ 58%] 2025-09-07T09:35:00.9118268Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0344s] [ 58%] 2025-09-07T09:35:00.9118583Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0366s] [ 58%] 2025-09-07T09:35:00.9118873Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0098s] [ 58%] 2025-09-07T09:35:00.9119147Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0237s] [ 58%] 2025-09-07T09:35:00.9119419Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0203s] [ 58%] 2025-09-07T09:35:00.9119697Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0099s] [ 58%] 2025-09-07T09:35:00.9119990Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0121s] [ 58%] 2025-09-07T09:35:00.9120278Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0222s] [ 58%] 2025-09-07T09:35:00.9120553Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0210s] [ 58%] 2025-09-07T09:35:00.9120823Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0256s] [ 58%] 2025-09-07T09:35:00.9121097Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0222s] [ 58%] 2025-09-07T09:35:00.9121368Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0323s] [ 58%] 2025-09-07T09:35:00.9121641Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0304s] [ 58%] 2025-09-07T09:35:00.9121912Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0102s] [ 58%] 2025-09-07T09:35:00.9122187Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0105s] [ 58%] 2025-09-07T09:35:00.9122461Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0059s] [ 58%] 2025-09-07T09:35:00.9122757Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0084s] [ 58%] 2025-09-07T09:35:00.9123042Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0076s] [ 59%] 2025-09-07T09:35:00.9124626Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0074s] [ 59%] 2025-09-07T09:35:00.9124917Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0133s] [ 59%] 2025-09-07T09:35:00.9125273Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 59%] 2025-09-07T09:35:00.9125570Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0111s] [ 59%] 2025-09-07T09:35:00.9125932Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 59%] 2025-09-07T09:35:00.9126205Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0073s] [ 59%] 2025-09-07T09:35:00.9126618Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 59%] 2025-09-07T09:35:00.9126889Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0086s] [ 59%] 2025-09-07T09:35:00.9127242Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 59%] 2025-09-07T09:35:00.9127521Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0074s] [ 59%] 2025-09-07T09:35:00.9127796Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0063s] [ 59%] 2025-09-07T09:35:00.9128070Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0066s] [ 59%] 2025-09-07T09:35:00.9128375Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0062s] [ 59%] 2025-09-07T09:35:00.9128668Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0064s] [ 59%] 2025-09-07T09:35:00.9128942Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0091s] [ 59%] 2025-09-07T09:35:00.9129214Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0161s] [ 59%] 2025-09-07T09:35:00.9129486Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0119s] [ 59%] 2025-09-07T09:35:00.9129782Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0166s] [ 59%] 2025-09-07T09:35:00.9130149Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 59%] 2025-09-07T09:35:00.9131719Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0124s] [ 59%] 2025-09-07T09:35:00.9132079Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 59%] 2025-09-07T09:35:00.9132352Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0109s] [ 59%] 2025-09-07T09:35:00.9132698Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 59%] 2025-09-07T09:35:00.9132969Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0194s] [ 59%] 2025-09-07T09:35:00.9133321Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 59%] 2025-09-07T09:35:00.9133594Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0259s] [ 59%] 2025-09-07T09:35:00.9133887Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0232s] [ 59%] 2025-09-07T09:35:00.9134174Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0160s] [ 59%] 2025-09-07T09:35:00.9134450Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0077s] [ 59%] 2025-09-07T09:35:00.9134723Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0069s] [ 59%] 2025-09-07T09:35:00.9134997Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0159s] [ 59%] 2025-09-07T09:35:00.9135294Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0191s] [ 59%] 2025-09-07T09:35:00.9135582Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0170s] [ 59%] 2025-09-07T09:35:00.9135855Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0192s] [ 59%] 2025-09-07T09:35:00.9136204Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 59%] 2025-09-07T09:35:00.9136476Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0069s] [ 59%] 2025-09-07T09:35:00.9136901Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 59%] 2025-09-07T09:35:00.9138465Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0056s] [ 59%] 2025-09-07T09:35:00.9138822Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 59%] 2025-09-07T09:35:00.9139152Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0110s] [ 59%] 2025-09-07T09:35:00.9139538Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 59%] 2025-09-07T09:35:00.9139831Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0069s] [ 59%] 2025-09-07T09:35:00.9140104Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0064s] [ 59%] 2025-09-07T09:35:00.9140374Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0076s] [ 59%] 2025-09-07T09:35:00.9140648Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0089s] [ 59%] 2025-09-07T09:35:00.9140941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0059s] [ 59%] 2025-09-07T09:35:00.9141237Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0072s] [ 59%] 2025-09-07T09:35:00.9141510Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0082s] [ 59%] 2025-09-07T09:35:00.9141784Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0146s] [ 59%] 2025-09-07T09:35:00.9142057Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0154s] [ 59%] 2025-09-07T09:35:00.9142406Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 59%] 2025-09-07T09:35:00.9142677Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0128s] [ 59%] 2025-09-07T09:35:00.9143025Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 59%] 2025-09-07T09:35:00.9143298Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0067s] [ 59%] 2025-09-07T09:35:00.9143649Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 59%] 2025-09-07T09:35:00.9143935Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0153s] [ 59%] 2025-09-07T09:35:00.9145593Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 59%] 2025-09-07T09:35:00.9145873Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0223s] [ 59%] 2025-09-07T09:35:00.9146144Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0241s] [ 59%] 2025-09-07T09:35:00.9146434Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0088s] [ 59%] 2025-09-07T09:35:00.9146772Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0104s] [ 59%] 2025-09-07T09:35:00.9147047Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0052s] [ 59%] 2025-09-07T09:35:00.9147324Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0084s] [ 59%] 2025-09-07T09:35:00.9147599Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0098s] [ 59%] 2025-09-07T09:35:00.9147873Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0105s] [ 59%] 2025-09-07T09:35:00.9148145Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0106s] [ 59%] 2025-09-07T09:35:00.9148414Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0105s] [ 59%] 2025-09-07T09:35:00.9148686Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0052s] [ 59%] 2025-09-07T09:35:00.9148955Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0053s] [ 59%] 2025-09-07T09:35:00.9149223Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0055s] [ 59%] 2025-09-07T09:35:00.9149530Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0084s] [ 59%] 2025-09-07T09:35:00.9149818Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0122s] [ 59%] 2025-09-07T09:35:00.9150094Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0101s] [ 59%] 2025-09-07T09:35:00.9150365Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0175s] [ 59%] 2025-09-07T09:35:00.9152036Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0100s] [ 59%] 2025-09-07T09:35:00.9152341Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0055s] [ 59%] 2025-09-07T09:35:00.9152613Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0077s] [ 59%] 2025-09-07T09:35:00.9152886Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0104s] [ 59%] 2025-09-07T09:35:00.9153160Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0104s] [ 59%] 2025-09-07T09:35:00.9153432Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0107s] [ 59%] 2025-09-07T09:35:00.9153703Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0114s] [ 59%] 2025-09-07T09:35:00.9153983Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0063s] [ 59%] 2025-09-07T09:35:00.9154253Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0066s] [ 59%] 2025-09-07T09:35:00.9154526Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0053s] [ 59%] 2025-09-07T09:35:00.9154798Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0055s] [ 59%] 2025-09-07T09:35:00.9155083Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0055s] [ 59%] 2025-09-07T09:35:00.9155373Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0062s] [ 59%] 2025-09-07T09:35:00.9155642Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0063s] [ 59%] 2025-09-07T09:35:00.9155916Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0071s] [ 59%] 2025-09-07T09:35:00.9156201Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0087s] [ 59%] 2025-09-07T09:35:00.9156549Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0100s] [ 59%] 2025-09-07T09:35:00.9156823Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0324s] [ 59%] 2025-09-07T09:35:00.9157100Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0431s] [ 59%] 2025-09-07T09:35:00.9158676Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0572s] [ 59%] 2025-09-07T09:35:00.9158965Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0536s] [ 59%] 2025-09-07T09:35:00.9159239Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0425s] [ 59%] 2025-09-07T09:35:00.9159514Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0082s] [ 59%] 2025-09-07T09:35:00.9159793Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0367s] [ 59%] 2025-09-07T09:35:00.9160064Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0333s] [ 59%] 2025-09-07T09:35:00.9160335Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0342s] [ 59%] 2025-09-07T09:35:00.9160644Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0543s] [ 59%] 2025-09-07T09:35:00.9160932Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0513s] [ 59%] 2025-09-07T09:35:00.9161211Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0459s] [ 59%] 2025-09-07T09:35:00.9161485Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0442s] [ 59%] 2025-09-07T09:35:00.9161778Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0580s] [ 59%] 2025-09-07T09:35:00.9162065Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0472s] [ 59%] 2025-09-07T09:35:00.9162336Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0327s] [ 59%] 2025-09-07T09:35:00.9162612Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0317s] [ 59%] 2025-09-07T09:35:00.9162887Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0281s] [ 59%] 2025-09-07T09:35:00.9163160Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0331s] [ 59%] 2025-09-07T09:35:00.9163434Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0360s] [ 59%] 2025-09-07T09:35:00.9163709Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0447s] [ 59%] 2025-09-07T09:35:00.9165272Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0294s] [ 59%] 2025-09-07T09:35:00.9165554Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0756s] [ 59%] 2025-09-07T09:35:00.9165825Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0711s] [ 59%] 2025-09-07T09:35:00.9166118Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0373s] [ 59%] 2025-09-07T09:35:00.9166405Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0452s] [ 59%] 2025-09-07T09:35:00.9166731Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0440s] [ 59%] 2025-09-07T09:35:00.9167009Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0326s] [ 59%] 2025-09-07T09:35:00.9167313Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0429s] [ 59%] 2025-09-07T09:35:00.9167600Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0368s] [ 59%] 2025-09-07T09:35:00.9167878Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0451s] [ 59%] 2025-09-07T09:35:00.9168150Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0483s] [ 60%] 2025-09-07T09:35:00.9168424Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0227s] [ 60%] 2025-09-07T09:35:00.9168695Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0376s] [ 60%] 2025-09-07T09:35:00.9168968Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0326s] [ 60%] 2025-09-07T09:35:00.9169244Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0451s] [ 60%] 2025-09-07T09:35:00.9169519Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0408s] [ 60%] 2025-09-07T09:35:00.9169792Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0295s] [ 60%] 2025-09-07T09:35:00.9170062Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0412s] [ 60%] 2025-09-07T09:35:00.9171742Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 60%] 2025-09-07T09:35:00.9172050Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0437s] [ 60%] 2025-09-07T09:35:00.9172403Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 60%] 2025-09-07T09:35:00.9172676Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0249s] [ 60%] 2025-09-07T09:35:00.9173043Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 60%] 2025-09-07T09:35:00.9173329Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0102s] [ 60%] 2025-09-07T09:35:00.9173677Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 60%] 2025-09-07T09:35:00.9173953Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0274s] [ 60%] 2025-09-07T09:35:00.9174224Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0285s] [ 60%] 2025-09-07T09:35:00.9174501Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0337s] [ 60%] 2025-09-07T09:35:00.9174774Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0348s] [ 60%] 2025-09-07T09:35:00.9175047Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0341s] [ 60%] 2025-09-07T09:35:00.9175322Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0332s] [ 60%] 2025-09-07T09:35:00.9175596Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0150s] [ 60%] 2025-09-07T09:35:00.9175894Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0331s] [ 60%] 2025-09-07T09:35:00.9176178Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0274s] [ 60%] 2025-09-07T09:35:00.9176621Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 SKIPPED [0.0006s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 60%] 2025-09-07T09:35:00.9176898Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0101s] [ 60%] 2025-09-07T09:35:00.9177279Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 60%] 2025-09-07T09:35:00.9178879Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0247s] [ 60%] 2025-09-07T09:35:00.9179365Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 60%] 2025-09-07T09:35:00.9179638Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0378s] [ 60%] 2025-09-07T09:35:00.9179986Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 60%] 2025-09-07T09:35:00.9180259Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0475s] [ 60%] 2025-09-07T09:35:00.9180535Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0451s] [ 60%] 2025-09-07T09:35:00.9180809Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0333s] [ 60%] 2025-09-07T09:35:00.9181083Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0199s] [ 60%] 2025-09-07T09:35:00.9181356Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0085s] [ 60%] 2025-09-07T09:35:00.9181664Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0187s] [ 60%] 2025-09-07T09:35:00.9181952Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0279s] [ 60%] 2025-09-07T09:35:00.9182225Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0106s] [ 60%] 2025-09-07T09:35:00.9182498Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0239s] [ 60%] 2025-09-07T09:35:00.9182862Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 60%] 2025-09-07T09:35:00.9183145Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0203s] [ 60%] 2025-09-07T09:35:00.9183494Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 60%] 2025-09-07T09:35:00.9183771Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0094s] [ 60%] 2025-09-07T09:35:00.9184120Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 60%] 2025-09-07T09:35:00.9185679Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0098s] [ 60%] 2025-09-07T09:35:00.9186039Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 60%] 2025-09-07T09:35:00.9186311Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0101s] [ 60%] 2025-09-07T09:35:00.9186717Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0095s] [ 60%] 2025-09-07T09:35:00.9186991Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0089s] [ 60%] 2025-09-07T09:35:00.9187294Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0091s] [ 60%] 2025-09-07T09:35:00.9187584Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0089s] [ 60%] 2025-09-07T09:35:00.9187855Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0102s] [ 60%] 2025-09-07T09:35:00.9188132Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0107s] [ 60%] 2025-09-07T09:35:00.9188434Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0118s] [ 60%] 2025-09-07T09:35:00.9188729Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0107s] [ 60%] 2025-09-07T09:35:00.9189076Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 60%] 2025-09-07T09:35:00.9189348Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0108s] [ 60%] 2025-09-07T09:35:00.9189700Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 60%] 2025-09-07T09:35:00.9189971Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0088s] [ 60%] 2025-09-07T09:35:00.9190316Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 60%] 2025-09-07T09:35:00.9190590Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0098s] [ 60%] 2025-09-07T09:35:00.9190940Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 SKIPPED [0.0005s] (Flash V2 does not accept is_casual when seq_len_q != seq_len_k) [ 60%] 2025-09-07T09:35:00.9192528Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0112s] [ 60%] 2025-09-07T09:35:00.9192833Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0092s] [ 60%] 2025-09-07T09:35:00.9193120Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0078s] [ 60%] 2025-09-07T09:35:00.9193395Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0079s] [ 60%] 2025-09-07T09:35:00.9193666Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0080s] [ 60%] 2025-09-07T09:35:00.9193950Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0089s] [ 60%] 2025-09-07T09:35:00.9194243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0092s] [ 60%] 2025-09-07T09:35:00.9194514Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0094s] [ 60%] 2025-09-07T09:35:00.9194783Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0092s] [ 60%] 2025-09-07T09:35:00.9195057Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0083s] [ 60%] 2025-09-07T09:35:00.9195328Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0079s] [ 60%] 2025-09-07T09:35:00.9195595Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0078s] [ 60%] 2025-09-07T09:35:00.9195896Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0079s] [ 60%] 2025-09-07T09:35:00.9196166Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0091s] [ 60%] 2025-09-07T09:35:00.9196438Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0088s] [ 60%] 2025-09-07T09:35:00.9196775Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0085s] [ 60%] 2025-09-07T09:35:00.9197075Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0089s] [ 60%] 2025-09-07T09:35:00.9197361Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0086s] [ 60%] 2025-09-07T09:35:00.9197637Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0086s] [ 60%] 2025-09-07T09:35:00.9197909Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0084s] [ 60%] 2025-09-07T09:35:00.9199728Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0083s] [ 60%] 2025-09-07T09:35:00.9200023Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0094s] [ 60%] 2025-09-07T09:35:00.9200295Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0092s] [ 60%] 2025-09-07T09:35:00.9200571Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0104s] [ 60%] 2025-09-07T09:35:00.9200844Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0092s] [ 60%] 2025-09-07T09:35:00.9201119Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0094s] [ 60%] 2025-09-07T09:35:00.9201391Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0082s] [ 60%] 2025-09-07T09:35:00.9201663Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0083s] [ 60%] 2025-09-07T09:35:00.9201935Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0081s] [ 60%] 2025-09-07T09:35:00.9202205Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16 PASSED [0.0093s] [ 60%] 2025-09-07T09:35:00.9202473Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel1_cuda_float16 PASSED [0.0088s] [ 60%] 2025-09-07T09:35:00.9202762Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16 PASSED [0.0094s] [ 60%] 2025-09-07T09:35:00.9203052Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel1_cuda_float16 PASSED [0.0092s] [ 60%] 2025-09-07T09:35:00.9203314Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_backwards_throws_determinism_warning_fused_kernel0_warn_only_False_cuda SKIPPED [0.0005s] (skipIfRocm: test doesn't currently work on the ROCm stack) [ 60%] 2025-09-07T09:35:00.9205143Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_backwards_throws_determinism_warning_fused_kernel0_warn_only_True_cuda SKIPPED [0.0004s] (skipIfRocm: test doesn't currently work on the ROCm stack) [ 60%] 2025-09-07T09:35:00.9205403Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_backwards_throws_determinism_warning_fused_kernel1_warn_only_False_cuda SKIPPED [0.0010s] (skipIfRocm: test doesn't currently work on the ROCm stack) [ 60%] 2025-09-07T09:35:00.9205680Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_backwards_throws_determinism_warning_fused_kernel1_warn_only_True_cuda SKIPPED [0.0004s] (skipIfRocm: test doesn't currently work on the ROCm stack) [ 60%] 2025-09-07T09:35:00.9206011Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda PASSED [0.0124s] [ 60%] 2025-09-07T09:35:00.9206341Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda ('RERUN', {'yellow': True}) [0.0123s] [ 60%] 2025-09-07T09:35:00.9206740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda ('RERUN', {'yellow': True}) [0.0125s] [ 60%] 2025-09-07T09:35:00.9207046Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda FAILED [0.0126s] [ 60%] 2025-09-07T09:35:00.9207050Z 2025-09-07T09:35:00.9207114Z ==================================== RERUNS ==================================== 2025-09-07T09:35:00.9207377Z _ TestSDPACudaOnlyCUDA.test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda _ 2025-09-07T09:35:00.9207430Z Traceback (most recent call last): 2025-09-07T09:35:00.9207579Z File "/var/lib/jenkins/pytorch/test/test_transformers.py", line 4001, in test_fused_kernels_nested_broadcasting 2025-09-07T09:35:00.9207706Z self.assertEqual(actual.contiguous(), math_ref.contiguous().to(dtype), atol=1.5e-3, rtol=1e-2) 2025-09-07T09:35:00.9207878Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 4179, in assertEqual 2025-09-07T09:35:00.9207953Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-09-07T09:35:00.9208005Z AssertionError: Tensor-likes are not close! 2025-09-07T09:35:00.9208007Z 2025-09-07T09:35:00.9208053Z Mismatched elements: 1 / 15360 (0.0%) 2025-09-07T09:35:00.9208162Z Greatest absolute difference: 0.001842498779296875 at index (6, 1, 2) (up to 0.0015 allowed) 2025-09-07T09:35:00.9208261Z Greatest relative difference: 0.2252197265625 at index (6, 1, 2) (up to 0.01 allowed) 2025-09-07T09:35:00.9208264Z 2025-09-07T09:35:00.9208306Z The failure occurred for item [8] 2025-09-07T09:35:00.9208336Z 2025-09-07T09:35:00.9208428Z To execute this test, run the following from the base repo dir: 2025-09-07T09:35:00.9208763Z PYTORCH_TEST_WITH_ROCM=1 python test/test_transformers.py TestSDPACudaOnlyCUDA.test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda 2025-09-07T09:35:00.9208767Z 2025-09-07T09:35:00.9208856Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T09:35:00.9209115Z _ TestSDPACudaOnlyCUDA.test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda _ 2025-09-07T09:35:00.9209160Z Traceback (most recent call last): 2025-09-07T09:35:00.9209298Z File "/var/lib/jenkins/pytorch/test/test_transformers.py", line 4001, in test_fused_kernels_nested_broadcasting 2025-09-07T09:35:00.9209415Z self.assertEqual(actual.contiguous(), math_ref.contiguous().to(dtype), atol=1.5e-3, rtol=1e-2) 2025-09-07T09:35:00.9209593Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 4179, in assertEqual 2025-09-07T09:35:00.9209692Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-09-07T09:35:00.9209742Z AssertionError: Tensor-likes are not close! 2025-09-07T09:35:00.9209744Z 2025-09-07T09:35:00.9209788Z Mismatched elements: 1 / 15360 (0.0%) 2025-09-07T09:35:00.9209893Z Greatest absolute difference: 0.001842498779296875 at index (6, 1, 2) (up to 0.0015 allowed) 2025-09-07T09:35:00.9209989Z Greatest relative difference: 0.2252197265625 at index (6, 1, 2) (up to 0.01 allowed) 2025-09-07T09:35:00.9209991Z 2025-09-07T09:35:00.9210033Z The failure occurred for item [8] 2025-09-07T09:35:00.9210037Z 2025-09-07T09:35:00.9210107Z To execute this test, run the following from the base repo dir: 2025-09-07T09:35:00.9210436Z PYTORCH_TEST_WITH_ROCM=1 python test/test_transformers.py TestSDPACudaOnlyCUDA.test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda 2025-09-07T09:35:00.9210444Z 2025-09-07T09:35:00.9210530Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T09:35:00.9210585Z =================================== FAILURES =================================== 2025-09-07T09:35:00.9210851Z _ TestSDPACudaOnlyCUDA.test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda _ 2025-09-07T09:35:00.9210898Z Traceback (most recent call last): 2025-09-07T09:35:00.9211039Z File "/var/lib/jenkins/pytorch/test/test_transformers.py", line 4001, in test_fused_kernels_nested_broadcasting 2025-09-07T09:35:00.9211155Z self.assertEqual(actual.contiguous(), math_ref.contiguous().to(dtype), atol=1.5e-3, rtol=1e-2) 2025-09-07T09:35:00.9211320Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 4179, in assertEqual 2025-09-07T09:35:00.9211389Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-09-07T09:35:00.9211443Z AssertionError: Tensor-likes are not close! 2025-09-07T09:35:00.9211445Z 2025-09-07T09:35:00.9211489Z Mismatched elements: 1 / 15360 (0.0%) 2025-09-07T09:35:00.9211602Z Greatest absolute difference: 0.001842498779296875 at index (6, 1, 2) (up to 0.0015 allowed) 2025-09-07T09:35:00.9211700Z Greatest relative difference: 0.2252197265625 at index (6, 1, 2) (up to 0.01 allowed) 2025-09-07T09:35:00.9211702Z 2025-09-07T09:35:00.9211751Z The failure occurred for item [8] 2025-09-07T09:35:00.9211753Z 2025-09-07T09:35:00.9211824Z To execute this test, run the following from the base repo dir: 2025-09-07T09:35:00.9212170Z PYTORCH_TEST_WITH_ROCM=1 python test/test_transformers.py TestSDPACudaOnlyCUDA.test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda 2025-09-07T09:35:00.9212184Z 2025-09-07T09:35:00.9212270Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T09:35:00.9212466Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/test_transformers/test_transformers-8c1d8beb2ffa9920.xml - 2025-09-07T09:35:00.9212529Z =========================== short test summary info ============================ 2025-09-07T09:35:00.9212889Z FAILED [0.0126s] test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda - AssertionError: Tensor-likes are not close! 2025-09-07T09:35:00.9212891Z 2025-09-07T09:35:00.9212936Z Mismatched elements: 1 / 15360 (0.0%) 2025-09-07T09:35:00.9213041Z Greatest absolute difference: 0.001842498779296875 at index (6, 1, 2) (up to 0.0015 allowed) 2025-09-07T09:35:00.9213135Z Greatest relative difference: 0.2252197265625 at index (6, 1, 2) (up to 0.01 allowed) 2025-09-07T09:35:00.9213157Z 2025-09-07T09:35:00.9213215Z The failure occurred for item [8] 2025-09-07T09:35:00.9213217Z 2025-09-07T09:35:00.9213288Z To execute this test, run the following from the base repo dir: 2025-09-07T09:35:00.9213618Z PYTORCH_TEST_WITH_ROCM=1 python test/test_transformers.py TestSDPACudaOnlyCUDA.test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda 2025-09-07T09:35:00.9213621Z 2025-09-07T09:35:00.9213708Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-07T09:35:00.9213772Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-07T09:35:00.9213857Z ====== 1 failed, 4475 passed, 2972 skipped, 2 rerun in 117.49s (0:01:57) ======= 2025-09-07T09:35:00.9213896Z Got exit code 1 2025-09-07T09:35:00.9213939Z Retrying single test... 2025-09-07T09:35:00.9214375Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T09:35:00.9214420Z import pkg_resources 2025-09-07T09:35:00.9214568Z Test results will be stored in test-reports/python-pytest/test_transformers/test_transformers-8cab0ccffa18ace7.xml 2025-09-07T09:35:00.9214630Z ============================= test session starts ============================== 2025-09-07T09:35:00.9214744Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T09:35:00.9214789Z cachedir: .pytest_cache 2025-09-07T09:35:00.9214947Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T09:35:00.9214994Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T09:35:00.9215035Z configfile: pytest.ini 2025-09-07T09:35:00.9215202Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T09:35:00.9215286Z collecting ... collected 12244 items / 12243 deselected / 1 selected 2025-09-07T09:35:00.9215657Z stepcurrent: skipping 7447 already run items. Running only test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda 2025-09-07T09:35:00.9215706Z Running 1 items in this shard 2025-09-07T09:35:00.9215708Z 2025-09-07T09:35:00.9216038Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda PASSED [0.4391s] [100%] 2025-09-07T09:35:00.9216054Z 2025-09-07T09:35:00.9216247Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/test_transformers/test_transformers-8cab0ccffa18ace7.xml - 2025-09-07T09:35:00.9216316Z ===================== 1 passed, 12243 deselected in 0.92s ====================== 2025-09-07T09:35:00.9216354Z Got exit code 0 2025-09-07T09:35:00.9216441Z Test succeeeded in new process, continuing with the rest of the tests 2025-09-07T09:35:00.9216948Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T09:35:00.9216995Z import pkg_resources 2025-09-07T09:35:00.9217142Z Test results will be stored in test-reports/python-pytest/test_transformers/test_transformers-bcc35b60640cc0b7.xml 2025-09-07T09:35:00.9217231Z ============================= test session starts ============================== 2025-09-07T09:35:00.9217358Z platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-09-07T09:35:00.9217402Z cachedir: .pytest_cache 2025-09-07T09:35:00.9217558Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-09-07T09:35:00.9217605Z rootdir: /var/lib/jenkins/pytorch 2025-09-07T09:35:00.9218885Z configfile: pytest.ini 2025-09-07T09:35:00.9219108Z plugins: hypothesis-5.35.1, subtests-0.13.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, typeguard-4.3.0 2025-09-07T09:35:00.9219190Z collecting ... collected 12244 items / 7448 deselected / 4796 selected 2025-09-07T09:35:00.9219245Z stepcurrent: skipping 7448 already run items. 2025-09-07T09:35:00.9219292Z Running 4796 items in this shard 2025-09-07T09:35:00.9219294Z 2025-09-07T09:35:00.9219604Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda PASSED [0.4416s] [ 0%] 2025-09-07T09:35:00.9219910Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda PASSED [0.0185s] [ 0%] 2025-09-07T09:35:00.9220215Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda PASSED [0.0171s] [ 0%] 2025-09-07T09:35:00.9220520Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda PASSED [0.0124s] [ 0%] 2025-09-07T09:35:00.9220822Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda PASSED [0.0122s] [ 0%] 2025-09-07T09:35:00.9221122Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda PASSED [0.0093s] [ 0%] 2025-09-07T09:35:00.9221453Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda PASSED [0.0135s] [ 0%] 2025-09-07T09:35:00.9221773Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda PASSED [0.0115s] [ 0%] 2025-09-07T09:35:00.9222076Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda PASSED [0.0115s] [ 0%] 2025-09-07T09:35:00.9222378Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda PASSED [0.0118s] [ 0%] 2025-09-07T09:35:00.9222681Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda PASSED [0.0115s] [ 0%] 2025-09-07T09:35:00.9222999Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda PASSED [0.0119s] [ 0%] 2025-09-07T09:35:00.9223313Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda PASSED [0.0120s] [ 0%] 2025-09-07T09:35:00.9223612Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda PASSED [0.0086s] [ 0%] 2025-09-07T09:35:00.9223915Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda PASSED [0.0114s] [ 0%] 2025-09-07T09:35:00.9224218Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda PASSED [0.0116s] [ 0%] 2025-09-07T09:35:00.9225586Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda PASSED [0.0117s] [ 0%] 2025-09-07T09:35:00.9225889Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda PASSED [0.0118s] [ 0%] 2025-09-07T09:35:00.9226195Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda PASSED [0.0116s] [ 0%] 2025-09-07T09:35:00.9226561Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda PASSED [0.0119s] [ 0%] 2025-09-07T09:35:00.9226884Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda PASSED [0.0119s] [ 0%] 2025-09-07T09:35:00.9227201Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda PASSED [0.0085s] [ 0%] 2025-09-07T09:35:00.9227507Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda PASSED [0.0145s] [ 0%] 2025-09-07T09:35:00.9227808Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda PASSED [0.0113s] [ 0%] 2025-09-07T09:35:00.9228108Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda PASSED [0.0112s] [ 0%] 2025-09-07T09:35:00.9228433Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda PASSED [0.0115s] [ 0%] 2025-09-07T09:35:00.9228749Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda PASSED [0.0113s] [ 0%] 2025-09-07T09:35:00.9229047Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda PASSED [0.0116s] [ 0%] 2025-09-07T09:35:00.9229347Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda PASSED [0.0117s] [ 0%] 2025-09-07T09:35:00.9229646Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda PASSED [0.0083s] [ 0%] 2025-09-07T09:35:00.9229949Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda PASSED [0.0123s] [ 0%] 2025-09-07T09:35:00.9230250Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda PASSED [0.0117s] [ 0%] 2025-09-07T09:35:00.9230554Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda PASSED [0.0109s] [ 0%] 2025-09-07T09:35:00.9230856Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda PASSED [0.0114s] [ 0%] 2025-09-07T09:35:00.9231157Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda PASSED [0.0110s] [ 0%] 2025-09-07T09:35:00.9231469Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda PASSED [0.0114s] [ 0%] 2025-09-07T09:35:00.9233412Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda PASSED [0.0113s] [ 0%] 2025-09-07T09:35:00.9233712Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda PASSED [0.0083s] [ 0%] 2025-09-07T09:35:00.9234016Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda PASSED [0.0109s] [ 0%] 2025-09-07T09:35:00.9234333Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda PASSED [0.0105s] [ 0%] 2025-09-07T09:35:00.9234646Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda PASSED [0.0105s] [ 0%] 2025-09-07T09:35:00.9234946Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda PASSED [0.0107s] [ 0%] 2025-09-07T09:35:00.9235247Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda PASSED [0.0105s] [ 0%] 2025-09-07T09:35:00.9235547Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda PASSED [0.0106s] [ 0%] 2025-09-07T09:35:00.9235845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda PASSED [0.0108s] [ 0%] 2025-09-07T09:35:00.9236140Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda PASSED [0.0079s] [ 0%] 2025-09-07T09:35:00.9236443Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda PASSED [0.0102s] [ 0%] 2025-09-07T09:35:00.9236806Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda PASSED [0.0106s] [ 1%] 2025-09-07T09:35:00.9237108Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda PASSED [0.0105s] [ 1%] 2025-09-07T09:35:00.9237436Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda PASSED [0.0108s] [ 1%] 2025-09-07T09:35:00.9237768Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda PASSED [0.0106s] [ 1%] 2025-09-07T09:35:00.9238068Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda PASSED [0.0108s] [ 1%] 2025-09-07T09:35:00.9238368Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda PASSED [0.0107s] [ 1%] 2025-09-07T09:35:00.9238666Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda PASSED [0.0078s] [ 1%] 2025-09-07T09:35:00.9238989Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda PASSED [0.0016s] [ 1%] 2025-09-07T09:35:00.9239305Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda PASSED [0.0015s] [ 1%] 2025-09-07T09:35:00.9240647Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda PASSED [0.0016s] [ 1%] 2025-09-07T09:35:00.9240947Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda PASSED [0.0016s] [ 1%] 2025-09-07T09:35:00.9241247Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda PASSED [0.0016s] [ 1%] 2025-09-07T09:35:00.9241543Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda PASSED [0.0016s] [ 1%] 2025-09-07T09:35:00.9241842Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda PASSED [0.0017s] [ 1%] 2025-09-07T09:35:00.9242139Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda PASSED [0.0015s] [ 1%] 2025-09-07T09:35:00.9242510Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda SKIPPED [0.0005s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9242898Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9243273Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda SKIPPED [0.0005s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9243635Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9243997Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9244372Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9244745Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9245103Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda SKIPPED [0.0005s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9245469Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9245828Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9246189Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9246614Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9246974Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9247331Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9247723Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9249133Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9249492Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9249871Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9250248Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda SKIPPED [0.0005s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9250606Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9250967Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9251329Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9251688Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9252046Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9252406Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9252764Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9253149Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9253525Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9253885Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9254258Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda SKIPPED [0.0005s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9254629Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9254983Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9255344Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 1%] 2025-09-07T09:35:00.9255706Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9256064Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9257504Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda SKIPPED [0.0005s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9257866Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9258224Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9258617Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9259050Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9259414Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda SKIPPED [0.0005s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9259790Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9260165Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9260523Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9260882Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9261241Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9261602Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9261959Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9262319Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9262677Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9263051Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9263421Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda SKIPPED [0.0005s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9263781Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9264141Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9264514Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9265901Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9266260Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9266697Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9267055Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9267413Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9267772Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9268129Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9268518Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda SKIPPED [0.0005s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9268891Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel1_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda SKIPPED [0.0004s] (head_dim != head_dim_v unsupported on ROCm for now) [ 2%] 2025-09-07T09:35:00.9269159Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_query_dense_cuda SKIPPED [0.0004s] (skipIfRocm: Efficient Attention on ROCM does not support head_dim != head_dim_v for now.) [ 2%] 2025-09-07T09:35:00.9269309Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_seq_len_1_inputs_fused_kernel0_cuda PASSED [0.0149s] [ 2%] 2025-09-07T09:35:00.9269452Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_seq_len_1_inputs_fused_kernel1_cuda PASSED [0.0141s] [ 2%] 2025-09-07T09:35:00.9269580Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_choice_type_dense_cuda PASSED [0.0009s] [ 2%] 2025-09-07T09:35:00.9269724Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_choice_type_nested_cuda PASSED [0.0007s] [ 2%] 2025-09-07T09:35:00.9269954Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_priority_order_use_compile_False_cuda SKIPPED [0.0001s] (cuDNN Attention is not supported on this system) [ 2%] 2025-09-07T09:35:00.9270156Z test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_priority_order_use_compile_True_cuda SKIPPED [0.0001s] (cuDNN Attention is not supported on this system) [ 2%] 2025-09-07T09:35:00.9270313Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_long_sequence_mask_float16_cuda_float16 PASSED [0.0241s] [ 2%] 2025-09-07T09:35:00.9270466Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_long_sequence_mask_float32_cuda_float32 PASSED [0.0082s] [ 2%] 2025-09-07T09:35:00.9270604Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_non_contig_mask_bug_cuda PASSED [0.0110s] [ 2%] 2025-09-07T09:35:00.9270762Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_non_contiguous_mask_float16_cuda_float16 PASSED [0.0053s] [ 2%] 2025-09-07T09:35:00.9271936Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_non_contiguous_mask_float32_cuda_float32 PASSED [0.0023s] [ 2%] 2025-09-07T09:35:00.9272165Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_backwards_determinism_cuda SKIPPED [0.0001s] (This test is not behaving deterministaclly non-deterministaclly on CI/CD) [ 2%] 2025-09-07T09:35:00.9272440Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.3158s] [ 2%] 2025-09-07T09:35:00.9272715Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0040s] [ 2%] 2025-09-07T09:35:00.9272984Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0097s] [ 2%] 2025-09-07T09:35:00.9273253Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0036s] [ 2%] 2025-09-07T09:35:00.9273516Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.1789s] [ 3%] 2025-09-07T09:35:00.9273802Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0045s] [ 3%] 2025-09-07T09:35:00.9274219Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 [W907 09:34:03.569952637 attention.cpp:916] Warning: Dropout mask should only be used for testing purposes. (function operator()) 2025-09-07T09:35:00.9274260Z PASSED [0.0812s] [ 3%] 2025-09-07T09:35:00.9274531Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0042s] [ 3%] 2025-09-07T09:35:00.9274798Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0127s] [ 3%] 2025-09-07T09:35:00.9275085Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0038s] [ 3%] 2025-09-07T09:35:00.9275366Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0107s] [ 3%] 2025-09-07T09:35:00.9275634Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0039s] [ 3%] 2025-09-07T09:35:00.9275898Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0095s] [ 3%] 2025-09-07T09:35:00.9276167Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 3%] 2025-09-07T09:35:00.9276428Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0098s] [ 3%] 2025-09-07T09:35:00.9276771Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 3%] 2025-09-07T09:35:00.9277038Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0084s] [ 3%] 2025-09-07T09:35:00.9277301Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0032s] [ 3%] 2025-09-07T09:35:00.9278580Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0121s] [ 3%] 2025-09-07T09:35:00.9278882Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0034s] [ 3%] 2025-09-07T09:35:00.9279163Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0117s] [ 3%] 2025-09-07T09:35:00.9279429Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0035s] [ 3%] 2025-09-07T09:35:00.9279692Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0091s] [ 3%] 2025-09-07T09:35:00.9279957Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0036s] [ 3%] 2025-09-07T09:35:00.9280243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0101s] [ 3%] 2025-09-07T09:35:00.9280525Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 3%] 2025-09-07T09:35:00.9280789Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0102s] [ 3%] 2025-09-07T09:35:00.9281052Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 3%] 2025-09-07T09:35:00.9281313Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0071s] [ 3%] 2025-09-07T09:35:00.9281578Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 3%] 2025-09-07T09:35:00.9281841Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0108s] [ 3%] 2025-09-07T09:35:00.9282108Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0033s] [ 3%] 2025-09-07T09:35:00.9282374Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0101s] [ 3%] 2025-09-07T09:35:00.9282637Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 3%] 2025-09-07T09:35:00.9282916Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0077s] [ 3%] 2025-09-07T09:35:00.9283194Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0032s] [ 3%] 2025-09-07T09:35:00.9283458Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0124s] [ 3%] 2025-09-07T09:35:00.9283724Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 3%] 2025-09-07T09:35:00.9284981Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0095s] [ 3%] 2025-09-07T09:35:00.9285267Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 3%] 2025-09-07T09:35:00.9285545Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0097s] [ 3%] 2025-09-07T09:35:00.9285810Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 3%] 2025-09-07T09:35:00.9286078Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0118s] [ 3%] 2025-09-07T09:35:00.9286348Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0032s] [ 3%] 2025-09-07T09:35:00.9286678Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0117s] [ 3%] 2025-09-07T09:35:00.9286944Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0033s] [ 3%] 2025-09-07T09:35:00.9287207Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0110s] [ 3%] 2025-09-07T09:35:00.9287474Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0034s] [ 3%] 2025-09-07T09:35:00.9287743Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0818s] [ 3%] 2025-09-07T09:35:00.9288012Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0053s] [ 3%] 2025-09-07T09:35:00.9288309Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0054s] [ 3%] 2025-09-07T09:35:00.9288593Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0053s] [ 3%] 2025-09-07T09:35:00.9288857Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0061s] [ 4%] 2025-09-07T09:35:00.9289123Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0062s] [ 4%] 2025-09-07T09:35:00.9289417Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0061s] [ 4%] 2025-09-07T09:35:00.9289705Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0066s] [ 4%] 2025-09-07T09:35:00.9289969Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0120s] [ 4%] 2025-09-07T09:35:00.9290236Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0067s] [ 4%] 2025-09-07T09:35:00.9290502Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0080s] [ 4%] 2025-09-07T09:35:00.9291791Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0071s] [ 4%] 2025-09-07T09:35:00.9292057Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 4%] 2025-09-07T09:35:00.9292326Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 4%] 2025-09-07T09:35:00.9292590Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0031s] [ 4%] 2025-09-07T09:35:00.9292855Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 4%] 2025-09-07T09:35:00.9293119Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0041s] [ 4%] 2025-09-07T09:35:00.9293401Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0034s] [ 4%] 2025-09-07T09:35:00.9293681Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0084s] [ 4%] 2025-09-07T09:35:00.9293950Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0035s] [ 4%] 2025-09-07T09:35:00.9294213Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0087s] [ 4%] 2025-09-07T09:35:00.9294478Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0036s] [ 4%] 2025-09-07T09:35:00.9294755Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0039s] [ 4%] 2025-09-07T09:35:00.9295033Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0039s] [ 4%] 2025-09-07T09:35:00.9295296Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0039s] [ 4%] 2025-09-07T09:35:00.9295563Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 4%] 2025-09-07T09:35:00.9295826Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0035s] [ 4%] 2025-09-07T09:35:00.9296089Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 4%] 2025-09-07T09:35:00.9296350Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0040s] [ 4%] 2025-09-07T09:35:00.9296680Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0033s] [ 4%] 2025-09-07T09:35:00.9296946Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0039s] [ 4%] 2025-09-07T09:35:00.9298219Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0035s] [ 4%] 2025-09-07T09:35:00.9298518Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0046s] [ 4%] 2025-09-07T09:35:00.9298798Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0035s] [ 4%] 2025-09-07T09:35:00.9299104Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0038s] [ 4%] 2025-09-07T09:35:00.9299370Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0038s] [ 4%] 2025-09-07T09:35:00.9299636Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0559s] [ 4%] 2025-09-07T09:35:00.9299925Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0049s] [ 4%] 2025-09-07T09:35:00.9300201Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0054s] [ 4%] 2025-09-07T09:35:00.9300464Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0049s] [ 4%] 2025-09-07T09:35:00.9300726Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0054s] [ 4%] 2025-09-07T09:35:00.9300991Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0056s] [ 4%] 2025-09-07T09:35:00.9301256Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0056s] [ 4%] 2025-09-07T09:35:00.9301526Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0058s] [ 4%] 2025-09-07T09:35:00.9301788Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0043s] [ 4%] 2025-09-07T09:35:00.9302052Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0042s] [ 4%] 2025-09-07T09:35:00.9302314Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0111s] [ 4%] 2025-09-07T09:35:00.9302595Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0070s] [ 4%] 2025-09-07T09:35:00.9302874Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 4%] 2025-09-07T09:35:00.9303142Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0032s] [ 4%] 2025-09-07T09:35:00.9303406Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0037s] [ 4%] 2025-09-07T09:35:00.9304677Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0033s] [ 4%] 2025-09-07T09:35:00.9304961Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0035s] [ 5%] 2025-09-07T09:35:00.9305239Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 5%] 2025-09-07T09:35:00.9305504Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0045s] [ 5%] 2025-09-07T09:35:00.9305774Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0036s] [ 5%] 2025-09-07T09:35:00.9306040Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0041s] [ 5%] 2025-09-07T09:35:00.9306306Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0038s] [ 5%] 2025-09-07T09:35:00.9306650Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0034s] [ 5%] 2025-09-07T09:35:00.9306919Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0034s] [ 5%] 2025-09-07T09:35:00.9307185Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 5%] 2025-09-07T09:35:00.9307451Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 5%] 2025-09-07T09:35:00.9307712Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0039s] [ 5%] 2025-09-07T09:35:00.9308008Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 5%] 2025-09-07T09:35:00.9308294Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0057s] [ 5%] 2025-09-07T09:35:00.9308556Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 5%] 2025-09-07T09:35:00.9308820Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 5%] 2025-09-07T09:35:00.9309114Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0033s] [ 5%] 2025-09-07T09:35:00.9309391Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0032s] [ 5%] 2025-09-07T09:35:00.9309655Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0033s] [ 5%] 2025-09-07T09:35:00.9309917Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0062s] [ 5%] 2025-09-07T09:35:00.9310181Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0033s] [ 5%] 2025-09-07T09:35:00.9311458Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 5%] 2025-09-07T09:35:00.9311726Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 5%] 2025-09-07T09:35:00.9311987Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0036s] [ 5%] 2025-09-07T09:35:00.9312251Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 5%] 2025-09-07T09:35:00.9312514Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0031s] [ 5%] 2025-09-07T09:35:00.9312775Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 5%] 2025-09-07T09:35:00.9313055Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 5%] 2025-09-07T09:35:00.9313334Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 5%] 2025-09-07T09:35:00.9313595Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 5%] 2025-09-07T09:35:00.9313858Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 5%] 2025-09-07T09:35:00.9314119Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0034s] [ 5%] 2025-09-07T09:35:00.9314398Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 5%] 2025-09-07T09:35:00.9314676Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 5%] 2025-09-07T09:35:00.9314943Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 5%] 2025-09-07T09:35:00.9315205Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 5%] 2025-09-07T09:35:00.9315470Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 5%] 2025-09-07T09:35:00.9315731Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0072s] [ 5%] 2025-09-07T09:35:00.9315994Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 5%] 2025-09-07T09:35:00.9316261Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0040s] [ 5%] 2025-09-07T09:35:00.9316595Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 5%] 2025-09-07T09:35:00.9317864Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0038s] [ 5%] 2025-09-07T09:35:00.9318167Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0033s] [ 5%] 2025-09-07T09:35:00.9318449Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0040s] [ 5%] 2025-09-07T09:35:00.9318712Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 5%] 2025-09-07T09:35:00.9318977Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 5%] 2025-09-07T09:35:00.9319243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 5%] 2025-09-07T09:35:00.9319522Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 5%] 2025-09-07T09:35:00.9319800Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 5%] 2025-09-07T09:35:00.9320062Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0030s] [ 6%] 2025-09-07T09:35:00.9320327Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 6%] 2025-09-07T09:35:00.9320591Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0085s] [ 6%] 2025-09-07T09:35:00.9320857Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 6%] 2025-09-07T09:35:00.9321116Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0035s] [ 6%] 2025-09-07T09:35:00.9321379Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 6%] 2025-09-07T09:35:00.9321642Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0032s] [ 6%] 2025-09-07T09:35:00.9321905Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 6%] 2025-09-07T09:35:00.9322182Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 6%] 2025-09-07T09:35:00.9322459Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 6%] 2025-09-07T09:35:00.9322718Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0030s] [ 6%] 2025-09-07T09:35:00.9322977Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 6%] 2025-09-07T09:35:00.9323235Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0035s] [ 6%] 2025-09-07T09:35:00.9324510Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 6%] 2025-09-07T09:35:00.9324788Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0037s] [ 6%] 2025-09-07T09:35:00.9325054Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 6%] 2025-09-07T09:35:00.9325313Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0032s] [ 6%] 2025-09-07T09:35:00.9325580Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 6%] 2025-09-07T09:35:00.9325840Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0032s] [ 6%] 2025-09-07T09:35:00.9326103Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 6%] 2025-09-07T09:35:00.9326367Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0037s] [ 6%] 2025-09-07T09:35:00.9326710Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 6%] 2025-09-07T09:35:00.9326967Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0033s] [ 6%] 2025-09-07T09:35:00.9327225Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 6%] 2025-09-07T09:35:00.9327511Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0036s] [ 6%] 2025-09-07T09:35:00.9327790Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 6%] 2025-09-07T09:35:00.9328053Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0040s] [ 6%] 2025-09-07T09:35:00.9328316Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 6%] 2025-09-07T09:35:00.9328573Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0040s] [ 6%] 2025-09-07T09:35:00.9328853Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 6%] 2025-09-07T09:35:00.9329126Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0033s] [ 6%] 2025-09-07T09:35:00.9329386Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 6%] 2025-09-07T09:35:00.9329646Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 6%] 2025-09-07T09:35:00.9330908Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 6%] 2025-09-07T09:35:00.9331169Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 6%] 2025-09-07T09:35:00.9331430Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 6%] 2025-09-07T09:35:00.9331692Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0030s] [ 6%] 2025-09-07T09:35:00.9331952Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 6%] 2025-09-07T09:35:00.9332213Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 6%] 2025-09-07T09:35:00.9332493Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 6%] 2025-09-07T09:35:00.9332766Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0030s] [ 6%] 2025-09-07T09:35:00.9333028Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 6%] 2025-09-07T09:35:00.9333288Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0032s] [ 6%] 2025-09-07T09:35:00.9333548Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 6%] 2025-09-07T09:35:00.9333836Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0552s] [ 6%] 2025-09-07T09:35:00.9334116Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0053s] [ 6%] 2025-09-07T09:35:00.9334378Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0052s] [ 6%] 2025-09-07T09:35:00.9334643Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0052s] [ 6%] 2025-09-07T09:35:00.9334907Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0057s] [ 7%] 2025-09-07T09:35:00.9335174Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0059s] [ 7%] 2025-09-07T09:35:00.9335521Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0031s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9335865Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0017s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9336203Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9336620Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9337992Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9338349Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9338618Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0038s] [ 7%] 2025-09-07T09:35:00.9338888Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 7%] 2025-09-07T09:35:00.9339242Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0035s] [ 7%] 2025-09-07T09:35:00.9339522Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 7%] 2025-09-07T09:35:00.9339783Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0034s] [ 7%] 2025-09-07T09:35:00.9340048Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0033s] [ 7%] 2025-09-07T09:35:00.9340388Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9340728Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9341065Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9341402Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9341736Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9342086Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9342366Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0040s] [ 7%] 2025-09-07T09:35:00.9342633Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 7%] 2025-09-07T09:35:00.9342892Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0039s] [ 7%] 2025-09-07T09:35:00.9343156Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 7%] 2025-09-07T09:35:00.9343433Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0031s] [ 7%] 2025-09-07T09:35:00.9343712Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0033s] [ 7%] 2025-09-07T09:35:00.9345056Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9345396Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9345734Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9346070Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9346408Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9346819Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9347088Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0296s] [ 7%] 2025-09-07T09:35:00.9347385Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0051s] [ 7%] 2025-09-07T09:35:00.9347665Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0051s] [ 7%] 2025-09-07T09:35:00.9347931Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0050s] [ 7%] 2025-09-07T09:35:00.9348191Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0056s] [ 7%] 2025-09-07T09:35:00.9348456Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0058s] [ 7%] 2025-09-07T09:35:00.9348813Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0046s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9349167Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9349504Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0015s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9349839Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0064s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9350174Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9350510Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 7%] 2025-09-07T09:35:00.9350783Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0048s] [ 7%] 2025-09-07T09:35:00.9351056Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0056s] [ 7%] 2025-09-07T09:35:00.9352330Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0054s] [ 7%] 2025-09-07T09:35:00.9352627Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0055s] [ 7%] 2025-09-07T09:35:00.9352913Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0068s] [ 8%] 2025-09-07T09:35:00.9353180Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0078s] [ 8%] 2025-09-07T09:35:00.9353521Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0038s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9353877Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0034s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9354230Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9354566Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9354903Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9355238Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0049s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9355505Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 8%] 2025-09-07T09:35:00.9355774Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0032s] [ 8%] 2025-09-07T09:35:00.9356037Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0033s] [ 8%] 2025-09-07T09:35:00.9356304Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 8%] 2025-09-07T09:35:00.9356663Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0038s] [ 8%] 2025-09-07T09:35:00.9356946Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0038s] [ 8%] 2025-09-07T09:35:00.9357285Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9357626Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9357988Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0018s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9358341Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9359688Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9360027Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9360296Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0037s] [ 8%] 2025-09-07T09:35:00.9360562Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0037s] [ 8%] 2025-09-07T09:35:00.9360822Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0039s] [ 8%] 2025-09-07T09:35:00.9361087Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0037s] [ 8%] 2025-09-07T09:35:00.9361347Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0040s] [ 8%] 2025-09-07T09:35:00.9361609Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0040s] [ 8%] 2025-09-07T09:35:00.9361967Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9362317Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9362651Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9362985Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9363346Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9363691Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9363958Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0343s] [ 8%] 2025-09-07T09:35:00.9364229Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0056s] [ 8%] 2025-09-07T09:35:00.9364493Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0061s] [ 8%] 2025-09-07T09:35:00.9364757Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0056s] [ 8%] 2025-09-07T09:35:00.9365022Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0066s] [ 8%] 2025-09-07T09:35:00.9365286Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0075s] [ 8%] 2025-09-07T09:35:00.9365625Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0036s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9367021Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9367385Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9367742Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0034s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9368077Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9368431Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 8%] 2025-09-07T09:35:00.9368718Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 8%] 2025-09-07T09:35:00.9368987Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0033s] [ 8%] 2025-09-07T09:35:00.9369251Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0033s] [ 8%] 2025-09-07T09:35:00.9369519Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 8%] 2025-09-07T09:35:00.9369782Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0034s] [ 9%] 2025-09-07T09:35:00.9370046Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0034s] [ 9%] 2025-09-07T09:35:00.9370386Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9370726Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9371061Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9371415Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9371766Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9372103Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9372369Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0067s] [ 9%] 2025-09-07T09:35:00.9372650Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 9%] 2025-09-07T09:35:00.9372926Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0073s] [ 9%] 2025-09-07T09:35:00.9374193Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 9%] 2025-09-07T09:35:00.9374460Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0040s] [ 9%] 2025-09-07T09:35:00.9374725Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 9%] 2025-09-07T09:35:00.9375064Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9375404Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9375740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9376075Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9376408Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9376853Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9377138Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 9%] 2025-09-07T09:35:00.9377408Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 9%] 2025-09-07T09:35:00.9377670Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0034s] [ 9%] 2025-09-07T09:35:00.9377978Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 9%] 2025-09-07T09:35:00.9378260Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 9%] 2025-09-07T09:35:00.9378524Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 9%] 2025-09-07T09:35:00.9378861Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9379239Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9379575Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9379910Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9380243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9381587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9381872Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 9%] 2025-09-07T09:35:00.9382159Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 9%] 2025-09-07T09:35:00.9382422Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0034s] [ 9%] 2025-09-07T09:35:00.9382688Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 9%] 2025-09-07T09:35:00.9382954Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0033s] [ 9%] 2025-09-07T09:35:00.9383234Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0032s] [ 9%] 2025-09-07T09:35:00.9383583Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9383922Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9384257Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9384592Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9384928Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9385264Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 9%] 2025-09-07T09:35:00.9385530Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 9%] 2025-09-07T09:35:00.9385796Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 9%] 2025-09-07T09:35:00.9386072Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 9%] 2025-09-07T09:35:00.9386347Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 9%] 2025-09-07T09:35:00.9386674Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0030s] [ 10%] 2025-09-07T09:35:00.9386936Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0030s] [ 10%] 2025-09-07T09:35:00.9387273Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9388651Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9389003Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9389340Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9389675Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9390009Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9390276Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 10%] 2025-09-07T09:35:00.9390543Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 10%] 2025-09-07T09:35:00.9390805Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0032s] [ 10%] 2025-09-07T09:35:00.9391066Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 10%] 2025-09-07T09:35:00.9391344Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0034s] [ 10%] 2025-09-07T09:35:00.9391622Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 10%] 2025-09-07T09:35:00.9391955Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9392291Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9392639Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0012s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9392987Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9393317Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0012s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9393651Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9393918Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 10%] 2025-09-07T09:35:00.9394181Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 10%] 2025-09-07T09:35:00.9394440Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 10%] 2025-09-07T09:35:00.9394701Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0036s] [ 10%] 2025-09-07T09:35:00.9395954Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0040s] [ 10%] 2025-09-07T09:35:00.9396215Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 10%] 2025-09-07T09:35:00.9396690Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9397047Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9397380Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0012s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9397715Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9398062Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0012s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9398410Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9398674Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0038s] [ 10%] 2025-09-07T09:35:00.9398941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 10%] 2025-09-07T09:35:00.9399201Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0102s] [ 10%] 2025-09-07T09:35:00.9399460Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 10%] 2025-09-07T09:35:00.9399719Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 10%] 2025-09-07T09:35:00.9399979Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 10%] 2025-09-07T09:35:00.9400314Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0016s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9400650Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9401001Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9401344Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9401675Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0012s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9402020Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 10%] 2025-09-07T09:35:00.9403310Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0039s] [ 10%] 2025-09-07T09:35:00.9403582Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0033s] [ 10%] 2025-09-07T09:35:00.9403848Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0033s] [ 10%] 2025-09-07T09:35:00.9404113Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 10%] 2025-09-07T09:35:00.9404376Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0031s] [ 11%] 2025-09-07T09:35:00.9404639Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 11%] 2025-09-07T09:35:00.9404905Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 11%] 2025-09-07T09:35:00.9405174Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0032s] [ 11%] 2025-09-07T09:35:00.9405436Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0037s] [ 11%] 2025-09-07T09:35:00.9405700Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 11%] 2025-09-07T09:35:00.9405982Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0036s] [ 11%] 2025-09-07T09:35:00.9406261Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 11%] 2025-09-07T09:35:00.9406599Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 11%] 2025-09-07T09:35:00.9406865Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 11%] 2025-09-07T09:35:00.9407124Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0033s] [ 11%] 2025-09-07T09:35:00.9407408Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 11%] 2025-09-07T09:35:00.9407690Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 11%] 2025-09-07T09:35:00.9407953Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 11%] 2025-09-07T09:35:00.9408220Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 11%] 2025-09-07T09:35:00.9408486Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 11%] 2025-09-07T09:35:00.9409746Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 11%] 2025-09-07T09:35:00.9410012Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 11%] 2025-09-07T09:35:00.9410274Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 11%] 2025-09-07T09:35:00.9410538Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 11%] 2025-09-07T09:35:00.9410800Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 11%] 2025-09-07T09:35:00.9411094Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 11%] 2025-09-07T09:35:00.9411375Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0030s] [ 11%] 2025-09-07T09:35:00.9411640Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 11%] 2025-09-07T09:35:00.9411899Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 11%] 2025-09-07T09:35:00.9412162Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 11%] 2025-09-07T09:35:00.9412442Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 11%] 2025-09-07T09:35:00.9412721Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 11%] 2025-09-07T09:35:00.9412984Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0034s] [ 11%] 2025-09-07T09:35:00.9413246Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 11%] 2025-09-07T09:35:00.9413509Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 11%] 2025-09-07T09:35:00.9413772Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 11%] 2025-09-07T09:35:00.9414036Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 11%] 2025-09-07T09:35:00.9414302Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 11%] 2025-09-07T09:35:00.9414562Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 11%] 2025-09-07T09:35:00.9414825Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 11%] 2025-09-07T09:35:00.9415097Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 11%] 2025-09-07T09:35:00.9416354Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 11%] 2025-09-07T09:35:00.9416679Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 11%] 2025-09-07T09:35:00.9416947Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 11%] 2025-09-07T09:35:00.9417213Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0101s] [ 11%] 2025-09-07T09:35:00.9417503Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 11%] 2025-09-07T09:35:00.9417788Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 11%] 2025-09-07T09:35:00.9418051Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 11%] 2025-09-07T09:35:00.9418319Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 11%] 2025-09-07T09:35:00.9418589Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 11%] 2025-09-07T09:35:00.9418853Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 11%] 2025-09-07T09:35:00.9419185Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 11%] 2025-09-07T09:35:00.9419448Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0030s] [ 12%] 2025-09-07T09:35:00.9419715Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 12%] 2025-09-07T09:35:00.9419981Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0038s] [ 12%] 2025-09-07T09:35:00.9420250Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0033s] [ 12%] 2025-09-07T09:35:00.9420536Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0032s] [ 12%] 2025-09-07T09:35:00.9420820Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 12%] 2025-09-07T09:35:00.9421084Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0032s] [ 12%] 2025-09-07T09:35:00.9421353Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0032s] [ 12%] 2025-09-07T09:35:00.9421631Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 12%] 2025-09-07T09:35:00.9422910Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 12%] 2025-09-07T09:35:00.9423172Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 12%] 2025-09-07T09:35:00.9423435Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 12%] 2025-09-07T09:35:00.9423699Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 12%] 2025-09-07T09:35:00.9423960Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 12%] 2025-09-07T09:35:00.9424223Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 12%] 2025-09-07T09:35:00.9424493Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 12%] 2025-09-07T09:35:00.9424755Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0028s] [ 12%] 2025-09-07T09:35:00.9425019Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 12%] 2025-09-07T09:35:00.9425279Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0031s] [ 12%] 2025-09-07T09:35:00.9425563Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0030s] [ 12%] 2025-09-07T09:35:00.9425840Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 12%] 2025-09-07T09:35:00.9426106Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 12%] 2025-09-07T09:35:00.9426364Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 12%] 2025-09-07T09:35:00.9426712Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 12%] 2025-09-07T09:35:00.9426995Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 12%] 2025-09-07T09:35:00.9427272Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 12%] 2025-09-07T09:35:00.9427536Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 12%] 2025-09-07T09:35:00.9427801Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 12%] 2025-09-07T09:35:00.9428065Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 12%] 2025-09-07T09:35:00.9428326Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 12%] 2025-09-07T09:35:00.9429592Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0034s] [ 12%] 2025-09-07T09:35:00.9429858Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 12%] 2025-09-07T09:35:00.9430123Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 12%] 2025-09-07T09:35:00.9430389Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 12%] 2025-09-07T09:35:00.9430669Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 12%] 2025-09-07T09:35:00.9430951Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 12%] 2025-09-07T09:35:00.9431212Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0031s] [ 12%] 2025-09-07T09:35:00.9431474Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 12%] 2025-09-07T09:35:00.9431738Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0091s] [ 12%] 2025-09-07T09:35:00.9432022Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 12%] 2025-09-07T09:35:00.9432297Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 12%] 2025-09-07T09:35:00.9432559Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 12%] 2025-09-07T09:35:00.9432821Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0036s] [ 12%] 2025-09-07T09:35:00.9433086Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0036s] [ 12%] 2025-09-07T09:35:00.9433350Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 12%] 2025-09-07T09:35:00.9433616Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 12%] 2025-09-07T09:35:00.9433877Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 12%] 2025-09-07T09:35:00.9434142Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 12%] 2025-09-07T09:35:00.9434407Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 13%] 2025-09-07T09:35:00.9434670Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 13%] 2025-09-07T09:35:00.9435937Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 13%] 2025-09-07T09:35:00.9436220Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 13%] 2025-09-07T09:35:00.9436545Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 13%] 2025-09-07T09:35:00.9436810Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 13%] 2025-09-07T09:35:00.9437108Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 13%] 2025-09-07T09:35:00.9437387Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 13%] 2025-09-07T09:35:00.9437654Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 13%] 2025-09-07T09:35:00.9437918Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 13%] 2025-09-07T09:35:00.9438179Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0031s] [ 13%] 2025-09-07T09:35:00.9438440Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 13%] 2025-09-07T09:35:00.9438698Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 13%] 2025-09-07T09:35:00.9438960Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 13%] 2025-09-07T09:35:00.9439224Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 13%] 2025-09-07T09:35:00.9439490Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 13%] 2025-09-07T09:35:00.9439750Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0034s] [ 13%] 2025-09-07T09:35:00.9440033Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 13%] 2025-09-07T09:35:00.9440309Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 13%] 2025-09-07T09:35:00.9440570Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 13%] 2025-09-07T09:35:00.9440832Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 13%] 2025-09-07T09:35:00.9441097Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 13%] 2025-09-07T09:35:00.9442376Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0034s] [ 13%] 2025-09-07T09:35:00.9442651Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 13%] 2025-09-07T09:35:00.9442910Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0029s] [ 13%] 2025-09-07T09:35:00.9443173Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 13%] 2025-09-07T09:35:00.9443437Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 13%] 2025-09-07T09:35:00.9443702Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 13%] 2025-09-07T09:35:00.9443963Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0036s] [ 13%] 2025-09-07T09:35:00.9444224Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 13%] 2025-09-07T09:35:00.9444483Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 13%] 2025-09-07T09:35:00.9444744Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 13%] 2025-09-07T09:35:00.9445021Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 13%] 2025-09-07T09:35:00.9445302Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 13%] 2025-09-07T09:35:00.9445561Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 13%] 2025-09-07T09:35:00.9445824Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 13%] 2025-09-07T09:35:00.9446083Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0029s] [ 13%] 2025-09-07T09:35:00.9446361Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 13%] 2025-09-07T09:35:00.9446701Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 13%] 2025-09-07T09:35:00.9446966Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0032s] [ 13%] 2025-09-07T09:35:00.9447225Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0033s] [ 13%] 2025-09-07T09:35:00.9447491Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0034s] [ 13%] 2025-09-07T09:35:00.9447752Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0034s] [ 13%] 2025-09-07T09:35:00.9449015Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0034s] [ 13%] 2025-09-07T09:35:00.9449282Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0040s] [ 13%] 2025-09-07T09:35:00.9449548Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 13%] 2025-09-07T09:35:00.9449806Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0032s] [ 13%] 2025-09-07T09:35:00.9450066Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 13%] 2025-09-07T09:35:00.9450358Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0031s] [ 14%] 2025-09-07T09:35:00.9450638Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 14%] 2025-09-07T09:35:00.9450901Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 14%] 2025-09-07T09:35:00.9451167Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 14%] 2025-09-07T09:35:00.9451449Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0038s] [ 14%] 2025-09-07T09:35:00.9451742Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 14%] 2025-09-07T09:35:00.9452000Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0029s] [ 14%] 2025-09-07T09:35:00.9452262Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 14%] 2025-09-07T09:35:00.9452524Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 14%] 2025-09-07T09:35:00.9452788Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 14%] 2025-09-07T09:35:00.9453045Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0030s] [ 14%] 2025-09-07T09:35:00.9453304Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 14%] 2025-09-07T09:35:00.9453562Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0040s] [ 14%] 2025-09-07T09:35:00.9453821Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 14%] 2025-09-07T09:35:00.9454080Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 14%] 2025-09-07T09:35:00.9455344Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 14%] 2025-09-07T09:35:00.9455623Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0033s] [ 14%] 2025-09-07T09:35:00.9455887Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 14%] 2025-09-07T09:35:00.9456151Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0037s] [ 14%] 2025-09-07T09:35:00.9456416Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 14%] 2025-09-07T09:35:00.9456781Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 14%] 2025-09-07T09:35:00.9457112Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 14%] 2025-09-07T09:35:00.9457430Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 14%] 2025-09-07T09:35:00.9457716Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 14%] 2025-09-07T09:35:00.9457973Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0031s] [ 14%] 2025-09-07T09:35:00.9458235Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 14%] 2025-09-07T09:35:00.9458501Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 14%] 2025-09-07T09:35:00.9458764Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 14%] 2025-09-07T09:35:00.9459074Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0033s] [ 14%] 2025-09-07T09:35:00.9459336Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 14%] 2025-09-07T09:35:00.9459621Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0032s] [ 14%] 2025-09-07T09:35:00.9459898Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 14%] 2025-09-07T09:35:00.9462906Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 14%] 2025-09-07T09:35:00.9463181Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 14%] 2025-09-07T09:35:00.9463450Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0030s] [ 14%] 2025-09-07T09:35:00.9463739Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 14%] 2025-09-07T09:35:00.9464018Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 14%] 2025-09-07T09:35:00.9464282Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 14%] 2025-09-07T09:35:00.9464552Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 14%] 2025-09-07T09:35:00.9464820Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 14%] 2025-09-07T09:35:00.9465080Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 14%] 2025-09-07T09:35:00.9465343Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 14%] 2025-09-07T09:35:00.9465605Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0039s] [ 14%] 2025-09-07T09:35:00.9465870Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 14%] 2025-09-07T09:35:00.9466138Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0041s] [ 14%] 2025-09-07T09:35:00.9466402Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 14%] 2025-09-07T09:35:00.9466779Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0078s] [ 14%] 2025-09-07T09:35:00.9467065Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 14%] 2025-09-07T09:35:00.9469802Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 15%] 2025-09-07T09:35:00.9470090Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 15%] 2025-09-07T09:35:00.9470365Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0043s] [ 15%] 2025-09-07T09:35:00.9470681Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 15%] 2025-09-07T09:35:00.9470967Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0033s] [ 15%] 2025-09-07T09:35:00.9471231Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 15%] 2025-09-07T09:35:00.9471499Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 15%] 2025-09-07T09:35:00.9471765Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 15%] 2025-09-07T09:35:00.9472030Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0040s] [ 15%] 2025-09-07T09:35:00.9472300Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 15%] 2025-09-07T09:35:00.9472560Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 15%] 2025-09-07T09:35:00.9472826Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 15%] 2025-09-07T09:35:00.9473084Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0039s] [ 15%] 2025-09-07T09:35:00.9473366Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 15%] 2025-09-07T09:35:00.9473644Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 15%] 2025-09-07T09:35:00.9473911Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 15%] 2025-09-07T09:35:00.9474170Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0032s] [ 15%] 2025-09-07T09:35:00.9474431Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 15%] 2025-09-07T09:35:00.9474708Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0031s] [ 15%] 2025-09-07T09:35:00.9474985Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 15%] 2025-09-07T09:35:00.9476805Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0038s] [ 15%] 2025-09-07T09:35:00.9477079Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 15%] 2025-09-07T09:35:00.9477340Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0037s] [ 15%] 2025-09-07T09:35:00.9477602Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 15%] 2025-09-07T09:35:00.9477859Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0037s] [ 15%] 2025-09-07T09:35:00.9478120Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 15%] 2025-09-07T09:35:00.9478381Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0040s] [ 15%] 2025-09-07T09:35:00.9478645Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 15%] 2025-09-07T09:35:00.9478905Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0040s] [ 15%] 2025-09-07T09:35:00.9479210Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 15%] 2025-09-07T09:35:00.9479485Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0040s] [ 15%] 2025-09-07T09:35:00.9479745Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 15%] 2025-09-07T09:35:00.9480005Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 15%] 2025-09-07T09:35:00.9480298Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 15%] 2025-09-07T09:35:00.9480576Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0041s] [ 15%] 2025-09-07T09:35:00.9480836Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 15%] 2025-09-07T09:35:00.9481097Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0030s] [ 15%] 2025-09-07T09:35:00.9481357Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 15%] 2025-09-07T09:35:00.9481618Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0040s] [ 15%] 2025-09-07T09:35:00.9481880Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 15%] 2025-09-07T09:35:00.9482139Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0037s] [ 15%] 2025-09-07T09:35:00.9483678Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 15%] 2025-09-07T09:35:00.9483941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0041s] [ 15%] 2025-09-07T09:35:00.9484202Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 15%] 2025-09-07T09:35:00.9484486Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 15%] 2025-09-07T09:35:00.9484767Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 15%] 2025-09-07T09:35:00.9485028Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 15%] 2025-09-07T09:35:00.9485291Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 15%] 2025-09-07T09:35:00.9485553Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 16%] 2025-09-07T09:35:00.9485833Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 16%] 2025-09-07T09:35:00.9486109Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 16%] 2025-09-07T09:35:00.9486374Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 16%] 2025-09-07T09:35:00.9486743Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 16%] 2025-09-07T09:35:00.9487007Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 16%] 2025-09-07T09:35:00.9487270Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0029s] [ 16%] 2025-09-07T09:35:00.9487533Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 16%] 2025-09-07T09:35:00.9487793Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 16%] 2025-09-07T09:35:00.9488057Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 16%] 2025-09-07T09:35:00.9488314Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 16%] 2025-09-07T09:35:00.9488597Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 16%] 2025-09-07T09:35:00.9488877Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0030s] [ 16%] 2025-09-07T09:35:00.9490370Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 16%] 2025-09-07T09:35:00.9490636Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0037s] [ 16%] 2025-09-07T09:35:00.9490901Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 16%] 2025-09-07T09:35:00.9491184Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0033s] [ 16%] 2025-09-07T09:35:00.9491461Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 16%] 2025-09-07T09:35:00.9491720Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0029s] [ 16%] 2025-09-07T09:35:00.9491980Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 16%] 2025-09-07T09:35:00.9492246Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 16%] 2025-09-07T09:35:00.9492512Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 16%] 2025-09-07T09:35:00.9492768Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0031s] [ 16%] 2025-09-07T09:35:00.9493029Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 16%] 2025-09-07T09:35:00.9493288Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0036s] [ 16%] 2025-09-07T09:35:00.9493550Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 16%] 2025-09-07T09:35:00.9493810Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 16%] 2025-09-07T09:35:00.9494090Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 16%] 2025-09-07T09:35:00.9494362Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0033s] [ 16%] 2025-09-07T09:35:00.9494626Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 16%] 2025-09-07T09:35:00.9494882Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0041s] [ 16%] 2025-09-07T09:35:00.9495142Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0033s] [ 16%] 2025-09-07T09:35:00.9495419Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 16%] 2025-09-07T09:35:00.9495700Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 16%] 2025-09-07T09:35:00.9497249Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0037s] [ 16%] 2025-09-07T09:35:00.9497515Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 16%] 2025-09-07T09:35:00.9497775Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 16%] 2025-09-07T09:35:00.9498037Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 16%] 2025-09-07T09:35:00.9498298Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0036s] [ 16%] 2025-09-07T09:35:00.9498564Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0033s] [ 16%] 2025-09-07T09:35:00.9498822Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0034s] [ 16%] 2025-09-07T09:35:00.9499138Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0034s] [ 16%] 2025-09-07T09:35:00.9499437Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0033s] [ 16%] 2025-09-07T09:35:00.9499719Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0033s] [ 16%] 2025-09-07T09:35:00.9499982Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 16%] 2025-09-07T09:35:00.9500250Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 16%] 2025-09-07T09:35:00.9500512Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 16%] 2025-09-07T09:35:00.9500795Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 16%] 2025-09-07T09:35:00.9501071Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 17%] 2025-09-07T09:35:00.9501332Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 17%] 2025-09-07T09:35:00.9501595Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 17%] 2025-09-07T09:35:00.9501862Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 17%] 2025-09-07T09:35:00.9502127Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 17%] 2025-09-07T09:35:00.9502389Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 17%] 2025-09-07T09:35:00.9503812Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 17%] 2025-09-07T09:35:00.9504080Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 17%] 2025-09-07T09:35:00.9504342Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 17%] 2025-09-07T09:35:00.9504604Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 17%] 2025-09-07T09:35:00.9504880Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 17%] 2025-09-07T09:35:00.9505155Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 17%] 2025-09-07T09:35:00.9505415Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 17%] 2025-09-07T09:35:00.9505675Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 17%] 2025-09-07T09:35:00.9505955Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 17%] 2025-09-07T09:35:00.9506230Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 17%] 2025-09-07T09:35:00.9506563Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0028s] [ 17%] 2025-09-07T09:35:00.9506824Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 17%] 2025-09-07T09:35:00.9507083Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 17%] 2025-09-07T09:35:00.9507343Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 17%] 2025-09-07T09:35:00.9507605Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0036s] [ 17%] 2025-09-07T09:35:00.9507868Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 17%] 2025-09-07T09:35:00.9508132Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 17%] 2025-09-07T09:35:00.9508393Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 17%] 2025-09-07T09:35:00.9508647Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 17%] 2025-09-07T09:35:00.9508935Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 17%] 2025-09-07T09:35:00.9509215Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 17%] 2025-09-07T09:35:00.9510639Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 17%] 2025-09-07T09:35:00.9510899Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 17%] 2025-09-07T09:35:00.9511158Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 17%] 2025-09-07T09:35:00.9511444Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0032s] [ 17%] 2025-09-07T09:35:00.9511722Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 17%] 2025-09-07T09:35:00.9511983Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 17%] 2025-09-07T09:35:00.9512245Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 17%] 2025-09-07T09:35:00.9512503Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 17%] 2025-09-07T09:35:00.9512764Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 17%] 2025-09-07T09:35:00.9513021Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0030s] [ 17%] 2025-09-07T09:35:00.9513281Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 17%] 2025-09-07T09:35:00.9513545Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 17%] 2025-09-07T09:35:00.9513810Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 17%] 2025-09-07T09:35:00.9514066Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 17%] 2025-09-07T09:35:00.9514339Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 17%] 2025-09-07T09:35:00.9514609Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 17%] 2025-09-07T09:35:00.9514870Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 17%] 2025-09-07T09:35:00.9515132Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 17%] 2025-09-07T09:35:00.9515415Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 17%] 2025-09-07T09:35:00.9515685Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0034s] [ 17%] 2025-09-07T09:35:00.9517161Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 17%] 2025-09-07T09:35:00.9517422Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0031s] [ 18%] 2025-09-07T09:35:00.9517682Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 18%] 2025-09-07T09:35:00.9517944Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0038s] [ 18%] 2025-09-07T09:35:00.9518205Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 18%] 2025-09-07T09:35:00.9518464Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0037s] [ 18%] 2025-09-07T09:35:00.9518725Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 18%] 2025-09-07T09:35:00.9518981Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0079s] [ 18%] 2025-09-07T09:35:00.9519241Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 18%] 2025-09-07T09:35:00.9519530Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 18%] 2025-09-07T09:35:00.9519807Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 18%] 2025-09-07T09:35:00.9520061Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0035s] [ 18%] 2025-09-07T09:35:00.9520318Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 18%] 2025-09-07T09:35:00.9520573Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 18%] 2025-09-07T09:35:00.9520858Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 18%] 2025-09-07T09:35:00.9521133Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 18%] 2025-09-07T09:35:00.9521395Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 18%] 2025-09-07T09:35:00.9521655Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0034s] [ 18%] 2025-09-07T09:35:00.9521915Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 18%] 2025-09-07T09:35:00.9522169Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0032s] [ 18%] 2025-09-07T09:35:00.9522435Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 18%] 2025-09-07T09:35:00.9523837Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 18%] 2025-09-07T09:35:00.9524098Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 18%] 2025-09-07T09:35:00.9524352Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0034s] [ 18%] 2025-09-07T09:35:00.9524607Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 18%] 2025-09-07T09:35:00.9524885Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 18%] 2025-09-07T09:35:00.9525155Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 18%] 2025-09-07T09:35:00.9525412Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 18%] 2025-09-07T09:35:00.9525673Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 18%] 2025-09-07T09:35:00.9525929Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0027s] [ 18%] 2025-09-07T09:35:00.9526204Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 18%] 2025-09-07T09:35:00.9526546Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0031s] [ 18%] 2025-09-07T09:35:00.9526802Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 18%] 2025-09-07T09:35:00.9527063Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 18%] 2025-09-07T09:35:00.9527324Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 18%] 2025-09-07T09:35:00.9527580Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 18%] 2025-09-07T09:35:00.9527835Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 18%] 2025-09-07T09:35:00.9528093Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 18%] 2025-09-07T09:35:00.9528350Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 18%] 2025-09-07T09:35:00.9528607Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 18%] 2025-09-07T09:35:00.9528893Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 18%] 2025-09-07T09:35:00.9530342Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0032s] [ 18%] 2025-09-07T09:35:00.9530609Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 18%] 2025-09-07T09:35:00.9530864Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0033s] [ 18%] 2025-09-07T09:35:00.9531121Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 18%] 2025-09-07T09:35:00.9531421Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0824s] [ 18%] 2025-09-07T09:35:00.9531710Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0083s] [ 18%] 2025-09-07T09:35:00.9531973Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0083s] [ 18%] 2025-09-07T09:35:00.9532240Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0083s] [ 18%] 2025-09-07T09:35:00.9532505Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0109s] [ 19%] 2025-09-07T09:35:00.9532774Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0118s] [ 19%] 2025-09-07T09:35:00.9533041Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0102s] [ 19%] 2025-09-07T09:35:00.9533312Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0111s] [ 19%] 2025-09-07T09:35:00.9533578Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0112s] [ 19%] 2025-09-07T09:35:00.9533845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0112s] [ 19%] 2025-09-07T09:35:00.9534107Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0143s] [ 19%] 2025-09-07T09:35:00.9534392Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0149s] [ 19%] 2025-09-07T09:35:00.9534677Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0040s] [ 19%] 2025-09-07T09:35:00.9534943Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0039s] [ 19%] 2025-09-07T09:35:00.9535204Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0040s] [ 19%] 2025-09-07T09:35:00.9535484Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0039s] [ 19%] 2025-09-07T09:35:00.9536974Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0047s] [ 19%] 2025-09-07T09:35:00.9537243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0048s] [ 19%] 2025-09-07T09:35:00.9537510Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0049s] [ 19%] 2025-09-07T09:35:00.9537781Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0049s] [ 19%] 2025-09-07T09:35:00.9538045Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0049s] [ 19%] 2025-09-07T09:35:00.9538309Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0049s] [ 19%] 2025-09-07T09:35:00.9538571Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0060s] [ 19%] 2025-09-07T09:35:00.9538838Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0060s] [ 19%] 2025-09-07T09:35:00.9539163Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0041s] [ 19%] 2025-09-07T09:35:00.9539429Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0041s] [ 19%] 2025-09-07T09:35:00.9539729Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0041s] [ 19%] 2025-09-07T09:35:00.9540012Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0040s] [ 19%] 2025-09-07T09:35:00.9540273Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0047s] [ 19%] 2025-09-07T09:35:00.9540535Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0049s] [ 19%] 2025-09-07T09:35:00.9540820Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0051s] [ 19%] 2025-09-07T09:35:00.9541104Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0051s] [ 19%] 2025-09-07T09:35:00.9541367Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0052s] [ 19%] 2025-09-07T09:35:00.9541629Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0051s] [ 19%] 2025-09-07T09:35:00.9541891Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0062s] [ 19%] 2025-09-07T09:35:00.9542156Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0061s] [ 19%] 2025-09-07T09:35:00.9542421Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0693s] [ 19%] 2025-09-07T09:35:00.9543839Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0074s] [ 19%] 2025-09-07T09:35:00.9544104Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0073s] [ 19%] 2025-09-07T09:35:00.9544369Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0073s] [ 19%] 2025-09-07T09:35:00.9544631Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0096s] [ 19%] 2025-09-07T09:35:00.9544915Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0107s] [ 19%] 2025-09-07T09:35:00.9545196Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0093s] [ 19%] 2025-09-07T09:35:00.9545465Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0101s] [ 19%] 2025-09-07T09:35:00.9545726Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0100s] [ 19%] 2025-09-07T09:35:00.9545991Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0101s] [ 19%] 2025-09-07T09:35:00.9546268Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0131s] [ 19%] 2025-09-07T09:35:00.9546625Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0132s] [ 19%] 2025-09-07T09:35:00.9546892Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0133s] [ 19%] 2025-09-07T09:35:00.9547160Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0115s] [ 19%] 2025-09-07T09:35:00.9547426Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0114s] [ 19%] 2025-09-07T09:35:00.9547691Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0115s] [ 19%] 2025-09-07T09:35:00.9547955Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0162s] [ 20%] 2025-09-07T09:35:00.9548221Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0171s] [ 20%] 2025-09-07T09:35:00.9548489Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0143s] [ 20%] 2025-09-07T09:35:00.9548760Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0156s] [ 20%] 2025-09-07T09:35:00.9549055Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0156s] [ 20%] 2025-09-07T09:35:00.9551688Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0149s] [ 20%] 2025-09-07T09:35:00.9551954Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0204s] [ 20%] 2025-09-07T09:35:00.9552223Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0202s] [ 20%] 2025-09-07T09:35:00.9552492Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0058s] [ 20%] 2025-09-07T09:35:00.9552785Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0064s] [ 20%] 2025-09-07T09:35:00.9553064Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0078s] [ 20%] 2025-09-07T09:35:00.9553327Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0086s] [ 20%] 2025-09-07T09:35:00.9553590Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0099s] [ 20%] 2025-09-07T09:35:00.9553854Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0099s] [ 20%] 2025-09-07T09:35:00.9554119Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0098s] [ 20%] 2025-09-07T09:35:00.9554387Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0104s] [ 20%] 2025-09-07T09:35:00.9554651Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0104s] [ 20%] 2025-09-07T09:35:00.9554917Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0104s] [ 20%] 2025-09-07T09:35:00.9555177Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0130s] [ 20%] 2025-09-07T09:35:00.9555463Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0127s] [ 20%] 2025-09-07T09:35:00.9555740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0058s] [ 20%] 2025-09-07T09:35:00.9556008Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0058s] [ 20%] 2025-09-07T09:35:00.9556267Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0058s] [ 20%] 2025-09-07T09:35:00.9556598Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0058s] [ 20%] 2025-09-07T09:35:00.9556879Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0075s] [ 20%] 2025-09-07T09:35:00.9557158Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0076s] [ 20%] 2025-09-07T09:35:00.9558567Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0077s] [ 20%] 2025-09-07T09:35:00.9558837Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0077s] [ 20%] 2025-09-07T09:35:00.9559103Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0075s] [ 20%] 2025-09-07T09:35:00.9559367Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0076s] [ 20%] 2025-09-07T09:35:00.9559629Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0101s] [ 20%] 2025-09-07T09:35:00.9559894Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0101s] [ 20%] 2025-09-07T09:35:00.9560161Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0429s] [ 20%] 2025-09-07T09:35:00.9560430Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0104s] [ 20%] 2025-09-07T09:35:00.9560690Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0104s] [ 20%] 2025-09-07T09:35:00.9560987Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0104s] [ 20%] 2025-09-07T09:35:00.9561266Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0145s] [ 20%] 2025-09-07T09:35:00.9561531Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0155s] [ 20%] 2025-09-07T09:35:00.9561796Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0133s] [ 20%] 2025-09-07T09:35:00.9562078Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0138s] [ 20%] 2025-09-07T09:35:00.9562354Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0095s] [ 20%] 2025-09-07T09:35:00.9562619Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0097s] [ 20%] 2025-09-07T09:35:00.9562881Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0179s] [ 20%] 2025-09-07T09:35:00.9563148Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0180s] [ 20%] 2025-09-07T09:35:00.9563417Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0079s] [ 20%] 2025-09-07T09:35:00.9563685Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0063s] [ 20%] 2025-09-07T09:35:00.9565078Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0063s] [ 20%] 2025-09-07T09:35:00.9565348Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0063s] [ 20%] 2025-09-07T09:35:00.9565613Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0074s] [ 21%] 2025-09-07T09:35:00.9565881Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0081s] [ 21%] 2025-09-07T09:35:00.9566167Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0076s] [ 21%] 2025-09-07T09:35:00.9566451Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0077s] [ 21%] 2025-09-07T09:35:00.9566794Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0077s] [ 21%] 2025-09-07T09:35:00.9567062Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0225s] [ 21%] 2025-09-07T09:35:00.9567326Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0574s] [ 21%] 2025-09-07T09:35:00.9567627Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0780s] [ 21%] 2025-09-07T09:35:00.9567908Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0036s] [ 21%] 2025-09-07T09:35:00.9568178Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0035s] [ 21%] 2025-09-07T09:35:00.9568440Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0035s] [ 21%] 2025-09-07T09:35:00.9568703Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0035s] [ 21%] 2025-09-07T09:35:00.9568963Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0033s] [ 21%] 2025-09-07T09:35:00.9569226Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0033s] [ 21%] 2025-09-07T09:35:00.9569492Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0041s] [ 21%] 2025-09-07T09:35:00.9569760Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0040s] [ 21%] 2025-09-07T09:35:00.9570024Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0041s] [ 21%] 2025-09-07T09:35:00.9570319Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0040s] [ 21%] 2025-09-07T09:35:00.9570597Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0039s] [ 21%] 2025-09-07T09:35:00.9572011Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0039s] [ 21%] 2025-09-07T09:35:00.9572275Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0036s] [ 21%] 2025-09-07T09:35:00.9572543Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0035s] [ 21%] 2025-09-07T09:35:00.9572825Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0035s] [ 21%] 2025-09-07T09:35:00.9573109Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 21%] 2025-09-07T09:35:00.9573371Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 21%] 2025-09-07T09:35:00.9573638Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0034s] [ 21%] 2025-09-07T09:35:00.9573904Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0041s] [ 21%] 2025-09-07T09:35:00.9574172Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0040s] [ 21%] 2025-09-07T09:35:00.9574434Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0040s] [ 21%] 2025-09-07T09:35:00.9574699Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0040s] [ 21%] 2025-09-07T09:35:00.9574961Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0041s] [ 21%] 2025-09-07T09:35:00.9575226Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0040s] [ 21%] 2025-09-07T09:35:00.9575512Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.5608s] [ 21%] 2025-09-07T09:35:00.9575797Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0431s] [ 21%] 2025-09-07T09:35:00.9576059Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0515s] [ 21%] 2025-09-07T09:35:00.9576324Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0542s] [ 21%] 2025-09-07T09:35:00.9576651Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0476s] [ 21%] 2025-09-07T09:35:00.9576947Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0653s] [ 21%] 2025-09-07T09:35:00.9577230Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0851s] [ 21%] 2025-09-07T09:35:00.9578652Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0714s] [ 21%] 2025-09-07T09:35:00.9578919Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0793s] [ 21%] 2025-09-07T09:35:00.9579257Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0678s] [ 21%] 2025-09-07T09:35:00.9579520Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0615s] [ 21%] 2025-09-07T09:35:00.9579783Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0917s] [ 21%] 2025-09-07T09:35:00.9580048Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 21%] 2025-09-07T09:35:00.9580315Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 21%] 2025-09-07T09:35:00.9580575Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0030s] [ 21%] 2025-09-07T09:35:00.9580838Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 21%] 2025-09-07T09:35:00.9581134Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 22%] 2025-09-07T09:35:00.9581418Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 22%] 2025-09-07T09:35:00.9581683Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 22%] 2025-09-07T09:35:00.9581949Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0033s] [ 22%] 2025-09-07T09:35:00.9582225Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0033s] [ 22%] 2025-09-07T09:35:00.9582502Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0033s] [ 22%] 2025-09-07T09:35:00.9582762Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0035s] [ 22%] 2025-09-07T09:35:00.9583024Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 22%] 2025-09-07T09:35:00.9583287Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.9583551Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 22%] 2025-09-07T09:35:00.9583810Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 22%] 2025-09-07T09:35:00.9584070Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 22%] 2025-09-07T09:35:00.9585486Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 22%] 2025-09-07T09:35:00.9585748Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 22%] 2025-09-07T09:35:00.9586009Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 22%] 2025-09-07T09:35:00.9586297Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 22%] 2025-09-07T09:35:00.9586644Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0030s] [ 22%] 2025-09-07T09:35:00.9586906Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 22%] 2025-09-07T09:35:00.9587166Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 22%] 2025-09-07T09:35:00.9587428Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 22%] 2025-09-07T09:35:00.9587727Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.9588009Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.9588267Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.9588526Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 22%] 2025-09-07T09:35:00.9588785Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.9589045Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 22%] 2025-09-07T09:35:00.9589308Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 22%] 2025-09-07T09:35:00.9589571Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 22%] 2025-09-07T09:35:00.9589832Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 22%] 2025-09-07T09:35:00.9590094Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 22%] 2025-09-07T09:35:00.9590350Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 22%] 2025-09-07T09:35:00.9590637Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.9592072Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 22%] 2025-09-07T09:35:00.9592340Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 22%] 2025-09-07T09:35:00.9592598Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0024s] [ 22%] 2025-09-07T09:35:00.9592889Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 22%] 2025-09-07T09:35:00.9593162Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 22%] 2025-09-07T09:35:00.9593420Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.9593683Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.9593948Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 22%] 2025-09-07T09:35:00.9594207Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0027s] [ 22%] 2025-09-07T09:35:00.9594467Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 22%] 2025-09-07T09:35:00.9594727Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0033s] [ 22%] 2025-09-07T09:35:00.9594992Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0032s] [ 22%] 2025-09-07T09:35:00.9595260Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.4734s] [ 22%] 2025-09-07T09:35:00.9595529Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0467s] [ 22%] 2025-09-07T09:35:00.9595804Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0584s] [ 22%] 2025-09-07T09:35:00.9596084Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0500s] [ 22%] 2025-09-07T09:35:00.9596346Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0677s] [ 23%] 2025-09-07T09:35:00.9596687Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.1190s] [ 23%] 2025-09-07T09:35:00.9597058Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.1070s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9597424Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0016s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9598912Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0149s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9599255Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9599594Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0015s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9599932Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0015s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9600200Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0079s] [ 23%] 2025-09-07T09:35:00.9600473Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0475s] [ 23%] 2025-09-07T09:35:00.9600737Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.1226s] [ 23%] 2025-09-07T09:35:00.9601002Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0516s] [ 23%] 2025-09-07T09:35:00.9601288Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0577s] [ 23%] 2025-09-07T09:35:00.9601570Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0901s] [ 23%] 2025-09-07T09:35:00.9601908Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0813s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9602249Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9602597Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9602946Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0021s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9603281Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0097s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9603616Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9603883Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0065s] [ 23%] 2025-09-07T09:35:00.9604149Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0064s] [ 23%] 2025-09-07T09:35:00.9604409Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0062s] [ 23%] 2025-09-07T09:35:00.9604672Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0061s] [ 23%] 2025-09-07T09:35:00.9604934Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0076s] [ 23%] 2025-09-07T09:35:00.9606348Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0077s] [ 23%] 2025-09-07T09:35:00.9606764Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9607104Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9607438Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9607795Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9608145Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9608481Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9608747Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.2075s] [ 23%] 2025-09-07T09:35:00.9609016Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0495s] [ 23%] 2025-09-07T09:35:00.9609278Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0549s] [ 23%] 2025-09-07T09:35:00.9609543Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0563s] [ 23%] 2025-09-07T09:35:00.9609805Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0599s] [ 23%] 2025-09-07T09:35:00.9610071Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.1159s] [ 23%] 2025-09-07T09:35:00.9610410Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.1033s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9610770Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9611127Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9611461Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0519s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9611795Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0015s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9612144Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0049s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 23%] 2025-09-07T09:35:00.9612426Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0152s] [ 23%] 2025-09-07T09:35:00.9613845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0721s] [ 23%] 2025-09-07T09:35:00.9614114Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0837s] [ 23%] 2025-09-07T09:35:00.9614381Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0838s] [ 23%] 2025-09-07T09:35:00.9614644Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.1021s] [ 24%] 2025-09-07T09:35:00.9614911Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0655s] [ 24%] 2025-09-07T09:35:00.9615251Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0507s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9615596Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0298s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9615952Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9616303Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9616728Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0015s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9617065Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9617362Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0198s] [ 24%] 2025-09-07T09:35:00.9617647Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0100s] [ 24%] 2025-09-07T09:35:00.9617908Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0200s] [ 24%] 2025-09-07T09:35:00.9618173Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0299s] [ 24%] 2025-09-07T09:35:00.9618438Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0332s] [ 24%] 2025-09-07T09:35:00.9618702Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0625s] [ 24%] 2025-09-07T09:35:00.9619083Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0434s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9619423Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9619758Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9621242Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0078s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9621602Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9621958Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9622225Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0101s] [ 24%] 2025-09-07T09:35:00.9622491Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0254s] [ 24%] 2025-09-07T09:35:00.9622766Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0321s] [ 24%] 2025-09-07T09:35:00.9623043Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0355s] [ 24%] 2025-09-07T09:35:00.9623302Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0390s] [ 24%] 2025-09-07T09:35:00.9623564Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0384s] [ 24%] 2025-09-07T09:35:00.9623902Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0032s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9624243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0054s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9624579Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9624914Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9625249Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9625597Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9625877Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0165s] [ 24%] 2025-09-07T09:35:00.9626146Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0231s] [ 24%] 2025-09-07T09:35:00.9626409Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0386s] [ 24%] 2025-09-07T09:35:00.9626755Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0351s] [ 24%] 2025-09-07T09:35:00.9627045Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0401s] [ 24%] 2025-09-07T09:35:00.9628469Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0581s] [ 24%] 2025-09-07T09:35:00.9628810Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0414s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9629151Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9629488Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9629824Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0052s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9630159Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0012s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9630496Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0015s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 24%] 2025-09-07T09:35:00.9630766Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0057s] [ 24%] 2025-09-07T09:35:00.9631080Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0203s] [ 24%] 2025-09-07T09:35:00.9631365Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0208s] [ 24%] 2025-09-07T09:35:00.9631631Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0223s] [ 25%] 2025-09-07T09:35:00.9631893Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0106s] [ 25%] 2025-09-07T09:35:00.9632161Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0107s] [ 25%] 2025-09-07T09:35:00.9632517Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0039s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9632873Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0842s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9633209Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9633547Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9633883Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9634221Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9634489Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0047s] [ 25%] 2025-09-07T09:35:00.9634757Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0039s] [ 25%] 2025-09-07T09:35:00.9636144Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0040s] [ 25%] 2025-09-07T09:35:00.9636432Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0039s] [ 25%] 2025-09-07T09:35:00.9636771Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0047s] [ 25%] 2025-09-07T09:35:00.9637037Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0046s] [ 25%] 2025-09-07T09:35:00.9637373Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9637746Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9638098Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9638434Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9638769Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9639103Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9639368Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0042s] [ 25%] 2025-09-07T09:35:00.9639635Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0042s] [ 25%] 2025-09-07T09:35:00.9639901Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0043s] [ 25%] 2025-09-07T09:35:00.9640165Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0041s] [ 25%] 2025-09-07T09:35:00.9640426Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0045s] [ 25%] 2025-09-07T09:35:00.9640715Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0046s] [ 25%] 2025-09-07T09:35:00.9641068Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9641407Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9641741Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9642095Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9643595Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9643932Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9644200Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.5204s] [ 25%] 2025-09-07T09:35:00.9644468Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0602s] [ 25%] 2025-09-07T09:35:00.9644730Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0711s] [ 25%] 2025-09-07T09:35:00.9644995Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0479s] [ 25%] 2025-09-07T09:35:00.9645257Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0595s] [ 25%] 2025-09-07T09:35:00.9645522Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.1064s] [ 25%] 2025-09-07T09:35:00.9645859Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0219s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9646221Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9646651Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9646985Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0057s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9647349Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9647706Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 25%] 2025-09-07T09:35:00.9647976Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0040s] [ 25%] 2025-09-07T09:35:00.9648241Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0035s] [ 25%] 2025-09-07T09:35:00.9648505Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0035s] [ 25%] 2025-09-07T09:35:00.9648767Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0035s] [ 26%] 2025-09-07T09:35:00.9649028Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0041s] [ 26%] 2025-09-07T09:35:00.9649290Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0041s] [ 26%] 2025-09-07T09:35:00.9650777Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9651121Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9651478Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9651829Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9652164Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9652497Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9652776Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 26%] 2025-09-07T09:35:00.9653052Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 26%] 2025-09-07T09:35:00.9653312Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 26%] 2025-09-07T09:35:00.9653574Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 26%] 2025-09-07T09:35:00.9653835Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 26%] 2025-09-07T09:35:00.9654096Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0030s] [ 26%] 2025-09-07T09:35:00.9654429Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9654765Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9655096Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9655427Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9655777Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9656120Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9656383Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0036s] [ 26%] 2025-09-07T09:35:00.9656723Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0032s] [ 26%] 2025-09-07T09:35:00.9658166Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0032s] [ 26%] 2025-09-07T09:35:00.9658447Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 26%] 2025-09-07T09:35:00.9658705Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0030s] [ 26%] 2025-09-07T09:35:00.9659033Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 26%] 2025-09-07T09:35:00.9659370Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9659704Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9660035Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9660367Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9660699Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9661057Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9661339Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 26%] 2025-09-07T09:35:00.9661604Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 26%] 2025-09-07T09:35:00.9661861Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0033s] [ 26%] 2025-09-07T09:35:00.9662122Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0033s] [ 26%] 2025-09-07T09:35:00.9662399Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0038s] [ 26%] 2025-09-07T09:35:00.9662670Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0038s] [ 26%] 2025-09-07T09:35:00.9663003Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9663339Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9663673Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9664003Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9664334Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9665804Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 26%] 2025-09-07T09:35:00.9666074Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.1969s] [ 26%] 2025-09-07T09:35:00.9666361Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0359s] [ 26%] 2025-09-07T09:35:00.9666727Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0390s] [ 26%] 2025-09-07T09:35:00.9666994Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0316s] [ 27%] 2025-09-07T09:35:00.9667256Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0067s] [ 27%] 2025-09-07T09:35:00.9667521Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0189s] [ 27%] 2025-09-07T09:35:00.9667814Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0793s] [ 27%] 2025-09-07T09:35:00.9668100Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0357s] [ 27%] 2025-09-07T09:35:00.9668363Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0871s] [ 27%] 2025-09-07T09:35:00.9668630Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0433s] [ 27%] 2025-09-07T09:35:00.9668893Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0476s] [ 27%] 2025-09-07T09:35:00.9669162Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0675s] [ 27%] 2025-09-07T09:35:00.9669425Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0038s] [ 27%] 2025-09-07T09:35:00.9669692Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0037s] [ 27%] 2025-09-07T09:35:00.9669952Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0036s] [ 27%] 2025-09-07T09:35:00.9670215Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0036s] [ 27%] 2025-09-07T09:35:00.9670492Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0035s] [ 27%] 2025-09-07T09:35:00.9670772Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0034s] [ 27%] 2025-09-07T09:35:00.9671037Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0052s] [ 27%] 2025-09-07T09:35:00.9672453Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0046s] [ 27%] 2025-09-07T09:35:00.9672717Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0041s] [ 27%] 2025-09-07T09:35:00.9673001Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0042s] [ 27%] 2025-09-07T09:35:00.9673274Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0041s] [ 27%] 2025-09-07T09:35:00.9673536Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0040s] [ 27%] 2025-09-07T09:35:00.9673802Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0039s] [ 27%] 2025-09-07T09:35:00.9674069Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0036s] [ 27%] 2025-09-07T09:35:00.9674328Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0037s] [ 27%] 2025-09-07T09:35:00.9674591Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0033s] [ 27%] 2025-09-07T09:35:00.9674855Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0033s] [ 27%] 2025-09-07T09:35:00.9675117Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0035s] [ 27%] 2025-09-07T09:35:00.9675381Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0041s] [ 27%] 2025-09-07T09:35:00.9675646Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0041s] [ 27%] 2025-09-07T09:35:00.9675923Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0042s] [ 27%] 2025-09-07T09:35:00.9676198Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0041s] [ 27%] 2025-09-07T09:35:00.9676459Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0041s] [ 27%] 2025-09-07T09:35:00.9676785Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0041s] [ 27%] 2025-09-07T09:35:00.9677095Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.7698s] [ 27%] 2025-09-07T09:35:00.9677379Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0361s] [ 27%] 2025-09-07T09:35:00.9677638Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0391s] [ 27%] 2025-09-07T09:35:00.9679042Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0387s] [ 27%] 2025-09-07T09:35:00.9679309Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0238s] [ 27%] 2025-09-07T09:35:00.9679573Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0817s] [ 27%] 2025-09-07T09:35:00.9679836Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0794s] [ 27%] 2025-09-07T09:35:00.9680104Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0221s] [ 27%] 2025-09-07T09:35:00.9680369Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0629s] [ 27%] 2025-09-07T09:35:00.9680633Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0561s] [ 27%] 2025-09-07T09:35:00.9680893Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0451s] [ 27%] 2025-09-07T09:35:00.9681187Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0828s] [ 27%] 2025-09-07T09:35:00.9681471Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0803s] [ 27%] 2025-09-07T09:35:00.9681739Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0544s] [ 27%] 2025-09-07T09:35:00.9682001Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0786s] [ 27%] 2025-09-07T09:35:00.9682266Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0695s] [ 28%] 2025-09-07T09:35:00.9682550Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0729s] [ 28%] 2025-09-07T09:35:00.9682829Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0720s] [ 28%] 2025-09-07T09:35:00.9683094Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0257s] [ 28%] 2025-09-07T09:35:00.9683364Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0868s] [ 28%] 2025-09-07T09:35:00.9683628Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0731s] [ 28%] 2025-09-07T09:35:00.9683894Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0578s] [ 28%] 2025-09-07T09:35:00.9684156Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0824s] [ 28%] 2025-09-07T09:35:00.9684423Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.1035s] [ 28%] 2025-09-07T09:35:00.9685806Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 28%] 2025-09-07T09:35:00.9686073Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0035s] [ 28%] 2025-09-07T09:35:00.9686354Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0035s] [ 28%] 2025-09-07T09:35:00.9686716Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0035s] [ 28%] 2025-09-07T09:35:00.9686978Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0039s] [ 28%] 2025-09-07T09:35:00.9687240Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0040s] [ 28%] 2025-09-07T09:35:00.9687504Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0044s] [ 28%] 2025-09-07T09:35:00.9687806Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0045s] [ 28%] 2025-09-07T09:35:00.9688083Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0042s] [ 28%] 2025-09-07T09:35:00.9688348Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0042s] [ 28%] 2025-09-07T09:35:00.9688610Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0050s] [ 28%] 2025-09-07T09:35:00.9688874Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0050s] [ 28%] 2025-09-07T09:35:00.9689138Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0038s] [ 28%] 2025-09-07T09:35:00.9689403Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0037s] [ 28%] 2025-09-07T09:35:00.9689663Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0038s] [ 28%] 2025-09-07T09:35:00.9689926Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0037s] [ 28%] 2025-09-07T09:35:00.9690189Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0040s] [ 28%] 2025-09-07T09:35:00.9690452Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0040s] [ 28%] 2025-09-07T09:35:00.9690734Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0045s] [ 28%] 2025-09-07T09:35:00.9691016Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0046s] [ 28%] 2025-09-07T09:35:00.9692410Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0045s] [ 28%] 2025-09-07T09:35:00.9692675Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0045s] [ 28%] 2025-09-07T09:35:00.9692953Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0051s] [ 28%] 2025-09-07T09:35:00.9693230Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0050s] [ 28%] 2025-09-07T09:35:00.9693493Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.7866s] [ 28%] 2025-09-07T09:35:00.9693761Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0342s] [ 28%] 2025-09-07T09:35:00.9694021Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0299s] [ 28%] 2025-09-07T09:35:00.9694284Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0128s] [ 28%] 2025-09-07T09:35:00.9694544Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0091s] [ 28%] 2025-09-07T09:35:00.9694806Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0262s] [ 28%] 2025-09-07T09:35:00.9695072Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0360s] [ 28%] 2025-09-07T09:35:00.9695340Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0309s] [ 28%] 2025-09-07T09:35:00.9695600Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0053s] [ 28%] 2025-09-07T09:35:00.9695879Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0055s] [ 28%] 2025-09-07T09:35:00.9696154Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0403s] [ 28%] 2025-09-07T09:35:00.9696417Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0541s] [ 28%] 2025-09-07T09:35:00.9696755Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0039s] [ 28%] 2025-09-07T09:35:00.9697022Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0036s] [ 28%] 2025-09-07T09:35:00.9697312Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0035s] [ 28%] 2025-09-07T09:35:00.9697591Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0035s] [ 29%] 2025-09-07T09:35:00.9697850Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0038s] [ 29%] 2025-09-07T09:35:00.9699470Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0038s] [ 29%] 2025-09-07T09:35:00.9699740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0041s] [ 29%] 2025-09-07T09:35:00.9700010Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0040s] [ 29%] 2025-09-07T09:35:00.9700272Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0040s] [ 29%] 2025-09-07T09:35:00.9700539Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0039s] [ 29%] 2025-09-07T09:35:00.9700802Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0042s] [ 29%] 2025-09-07T09:35:00.9701066Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0043s] [ 29%] 2025-09-07T09:35:00.9701362Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0037s] [ 29%] 2025-09-07T09:35:00.9701644Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0035s] [ 29%] 2025-09-07T09:35:00.9701907Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0035s] [ 29%] 2025-09-07T09:35:00.9702167Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0036s] [ 29%] 2025-09-07T09:35:00.9702430Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0037s] [ 29%] 2025-09-07T09:35:00.9702717Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0036s] [ 29%] 2025-09-07T09:35:00.9702994Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0040s] [ 29%] 2025-09-07T09:35:00.9703259Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0040s] [ 29%] 2025-09-07T09:35:00.9703518Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0040s] [ 29%] 2025-09-07T09:35:00.9703782Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0040s] [ 29%] 2025-09-07T09:35:00.9704043Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0041s] [ 29%] 2025-09-07T09:35:00.9704306Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0040s] [ 29%] 2025-09-07T09:35:00.9704566Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0037s] [ 29%] 2025-09-07T09:35:00.9705968Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0035s] [ 29%] 2025-09-07T09:35:00.9706230Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0035s] [ 29%] 2025-09-07T09:35:00.9706598Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 29%] 2025-09-07T09:35:00.9706875Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0034s] [ 29%] 2025-09-07T09:35:00.9707137Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0036s] [ 29%] 2025-09-07T09:35:00.9707403Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0040s] [ 29%] 2025-09-07T09:35:00.9707670Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0035s] [ 29%] 2025-09-07T09:35:00.9707959Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0034s] [ 29%] 2025-09-07T09:35:00.9708236Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0035s] [ 29%] 2025-09-07T09:35:00.9708494Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0038s] [ 29%] 2025-09-07T09:35:00.9708757Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0038s] [ 29%] 2025-09-07T09:35:00.9709021Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 29%] 2025-09-07T09:35:00.9709289Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 29%] 2025-09-07T09:35:00.9709551Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0030s] [ 29%] 2025-09-07T09:35:00.9709816Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 29%] 2025-09-07T09:35:00.9710075Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 29%] 2025-09-07T09:35:00.9710338Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0032s] [ 29%] 2025-09-07T09:35:00.9710602Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 29%] 2025-09-07T09:35:00.9710881Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0033s] [ 29%] 2025-09-07T09:35:00.9711155Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0038s] [ 29%] 2025-09-07T09:35:00.9711417Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0036s] [ 29%] 2025-09-07T09:35:00.9712824Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0038s] [ 29%] 2025-09-07T09:35:00.9713088Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0038s] [ 29%] 2025-09-07T09:35:00.9713375Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 29%] 2025-09-07T09:35:00.9713651Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 29%] 2025-09-07T09:35:00.9713907Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 29%] 2025-09-07T09:35:00.9714169Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 30%] 2025-09-07T09:35:00.9714427Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0033s] [ 30%] 2025-09-07T09:35:00.9714687Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 30%] 2025-09-07T09:35:00.9714950Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 30%] 2025-09-07T09:35:00.9715217Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 30%] 2025-09-07T09:35:00.9715479Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0030s] [ 30%] 2025-09-07T09:35:00.9715740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 30%] 2025-09-07T09:35:00.9716017Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0031s] [ 30%] 2025-09-07T09:35:00.9716292Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0030s] [ 30%] 2025-09-07T09:35:00.9716622Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 30%] 2025-09-07T09:35:00.9716883Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 30%] 2025-09-07T09:35:00.9717142Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 30%] 2025-09-07T09:35:00.9717423Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 30%] 2025-09-07T09:35:00.9717699Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0028s] [ 30%] 2025-09-07T09:35:00.9717958Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 30%] 2025-09-07T09:35:00.9719380Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 30%] 2025-09-07T09:35:00.9719649Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 30%] 2025-09-07T09:35:00.9719906Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 30%] 2025-09-07T09:35:00.9720166Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 30%] 2025-09-07T09:35:00.9720424Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0032s] [ 30%] 2025-09-07T09:35:00.9720686Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0032s] [ 30%] 2025-09-07T09:35:00.9720948Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 30%] 2025-09-07T09:35:00.9721209Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 30%] 2025-09-07T09:35:00.9721508Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 30%] 2025-09-07T09:35:00.9721784Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 30%] 2025-09-07T09:35:00.9722039Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 30%] 2025-09-07T09:35:00.9722297Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 30%] 2025-09-07T09:35:00.9722576Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 30%] 2025-09-07T09:35:00.9722861Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 30%] 2025-09-07T09:35:00.9723117Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 30%] 2025-09-07T09:35:00.9723378Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 30%] 2025-09-07T09:35:00.9723636Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0031s] [ 30%] 2025-09-07T09:35:00.9723895Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0030s] [ 30%] 2025-09-07T09:35:00.9724153Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 30%] 2025-09-07T09:35:00.9724417Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 30%] 2025-09-07T09:35:00.9724674Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 30%] 2025-09-07T09:35:00.9726100Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 30%] 2025-09-07T09:35:00.9726362Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 30%] 2025-09-07T09:35:00.9726741Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 30%] 2025-09-07T09:35:00.9727030Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 30%] 2025-09-07T09:35:00.9727293Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 30%] 2025-09-07T09:35:00.9727549Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 30%] 2025-09-07T09:35:00.9727808Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 30%] 2025-09-07T09:35:00.9728100Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 30%] 2025-09-07T09:35:00.9728378Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_312_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0030s] [ 30%] 2025-09-07T09:35:00.9728645Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 30%] 2025-09-07T09:35:00.9728912Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 30%] 2025-09-07T09:35:00.9729172Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 30%] 2025-09-07T09:35:00.9729434Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 31%] 2025-09-07T09:35:00.9729692Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0030s] [ 31%] 2025-09-07T09:35:00.9729954Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0030s] [ 31%] 2025-09-07T09:35:00.9730218Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 31%] 2025-09-07T09:35:00.9730483Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0033s] [ 31%] 2025-09-07T09:35:00.9730765Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0034s] [ 31%] 2025-09-07T09:35:00.9731039Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0033s] [ 31%] 2025-09-07T09:35:00.9731299Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0034s] [ 31%] 2025-09-07T09:35:00.9732707Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0034s] [ 31%] 2025-09-07T09:35:00.9732973Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 31%] 2025-09-07T09:35:00.9733253Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 31%] 2025-09-07T09:35:00.9733526Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 31%] 2025-09-07T09:35:00.9733787Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 31%] 2025-09-07T09:35:00.9734047Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 31%] 2025-09-07T09:35:00.9734310Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0042s] [ 31%] 2025-09-07T09:35:00.9734572Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0043s] [ 31%] 2025-09-07T09:35:00.9734836Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0035s] [ 31%] 2025-09-07T09:35:00.9735095Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0034s] [ 31%] 2025-09-07T09:35:00.9735358Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0033s] [ 31%] 2025-09-07T09:35:00.9735618Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0032s] [ 31%] 2025-09-07T09:35:00.9735878Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 31%] 2025-09-07T09:35:00.9736162Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 31%] 2025-09-07T09:35:00.9736439Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 31%] 2025-09-07T09:35:00.9736752Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0032s] [ 31%] 2025-09-07T09:35:00.9737010Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 31%] 2025-09-07T09:35:00.9737267Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0030s] [ 31%] 2025-09-07T09:35:00.9737562Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 31%] 2025-09-07T09:35:00.9737848Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0036s] [ 31%] 2025-09-07T09:35:00.9739297Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0034s] [ 31%] 2025-09-07T09:35:00.9739559Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0034s] [ 31%] 2025-09-07T09:35:00.9739824Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0034s] [ 31%] 2025-09-07T09:35:00.9740082Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0032s] [ 31%] 2025-09-07T09:35:00.9740344Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 31%] 2025-09-07T09:35:00.9740605Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 31%] 2025-09-07T09:35:00.9740869Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 31%] 2025-09-07T09:35:00.9741127Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 31%] 2025-09-07T09:35:00.9741425Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 31%] 2025-09-07T09:35:00.9741713Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0033s] [ 31%] 2025-09-07T09:35:00.9741976Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 31%] 2025-09-07T09:35:00.9742238Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 31%] 2025-09-07T09:35:00.9742501Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0035s] [ 31%] 2025-09-07T09:35:00.9742789Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0034s] [ 31%] 2025-09-07T09:35:00.9743062Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0034s] [ 31%] 2025-09-07T09:35:00.9743321Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0035s] [ 31%] 2025-09-07T09:35:00.9743582Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0044s] [ 31%] 2025-09-07T09:35:00.9743846Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0036s] [ 31%] 2025-09-07T09:35:00.9744114Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0034s] [ 31%] 2025-09-07T09:35:00.9744373Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0035s] [ 31%] 2025-09-07T09:35:00.9744638Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0034s] [ 32%] 2025-09-07T09:35:00.9746027Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0041s] [ 32%] 2025-09-07T09:35:00.9746292Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0039s] [ 32%] 2025-09-07T09:35:00.9746640Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0037s] [ 32%] 2025-09-07T09:35:00.9746943Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0036s] [ 32%] 2025-09-07T09:35:00.9747222Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0039s] [ 32%] 2025-09-07T09:35:00.9747486Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0039s] [ 32%] 2025-09-07T09:35:00.9747749Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0043s] [ 32%] 2025-09-07T09:35:00.9748030Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0038s] [ 32%] 2025-09-07T09:35:00.9748306Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 32%] 2025-09-07T09:35:00.9748570Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 32%] 2025-09-07T09:35:00.9748827Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0030s] [ 32%] 2025-09-07T09:35:00.9749089Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 32%] 2025-09-07T09:35:00.9749344Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0029s] [ 32%] 2025-09-07T09:35:00.9749605Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 32%] 2025-09-07T09:35:00.9749869Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 32%] 2025-09-07T09:35:00.9750134Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0032s] [ 32%] 2025-09-07T09:35:00.9750392Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0035s] [ 32%] 2025-09-07T09:35:00.9750652Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0035s] [ 32%] 2025-09-07T09:35:00.9750930Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0036s] [ 32%] 2025-09-07T09:35:00.9751205Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 32%] 2025-09-07T09:35:00.9752609Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 32%] 2025-09-07T09:35:00.9752873Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 32%] 2025-09-07T09:35:00.9753133Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 32%] 2025-09-07T09:35:00.9753412Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 32%] 2025-09-07T09:35:00.9753684Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0029s] [ 32%] 2025-09-07T09:35:00.9753943Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 32%] 2025-09-07T09:35:00.9754202Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 32%] 2025-09-07T09:35:00.9754466Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0032s] [ 32%] 2025-09-07T09:35:00.9754722Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0033s] [ 32%] 2025-09-07T09:35:00.9754982Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0033s] [ 32%] 2025-09-07T09:35:00.9755240Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0032s] [ 32%] 2025-09-07T09:35:00.9755502Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0038s] [ 32%] 2025-09-07T09:35:00.9755763Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 32%] 2025-09-07T09:35:00.9756038Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 32%] 2025-09-07T09:35:00.9756308Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 32%] 2025-09-07T09:35:00.9756645Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 32%] 2025-09-07T09:35:00.9756906Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0036s] [ 32%] 2025-09-07T09:35:00.9757167Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0042s] [ 32%] 2025-09-07T09:35:00.9757463Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0037s] [ 32%] 2025-09-07T09:35:00.9757743Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0034s] [ 32%] 2025-09-07T09:35:00.9758001Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0033s] [ 32%] 2025-09-07T09:35:00.9759393Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0035s] [ 32%] 2025-09-07T09:35:00.9759654Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0041s] [ 32%] 2025-09-07T09:35:00.9759915Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0034s] [ 32%] 2025-09-07T09:35:00.9760178Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 32%] 2025-09-07T09:35:00.9760442Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 32%] 2025-09-07T09:35:00.9760701Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 32%] 2025-09-07T09:35:00.9760965Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 33%] 2025-09-07T09:35:00.9761222Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0029s] [ 33%] 2025-09-07T09:35:00.9761514Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 33%] 2025-09-07T09:35:00.9761800Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 33%] 2025-09-07T09:35:00.9762065Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0033s] [ 33%] 2025-09-07T09:35:00.9762324Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0034s] [ 33%] 2025-09-07T09:35:00.9762585Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0037s] [ 33%] 2025-09-07T09:35:00.9762861Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0035s] [ 33%] 2025-09-07T09:35:00.9763138Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 33%] 2025-09-07T09:35:00.9763398Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 33%] 2025-09-07T09:35:00.9763661Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 33%] 2025-09-07T09:35:00.9763918Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 33%] 2025-09-07T09:35:00.9764177Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 33%] 2025-09-07T09:35:00.9764436Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 33%] 2025-09-07T09:35:00.9765812Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 33%] 2025-09-07T09:35:00.9766076Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 33%] 2025-09-07T09:35:00.9766344Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0032s] [ 33%] 2025-09-07T09:35:00.9766694Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 33%] 2025-09-07T09:35:00.9766971Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 33%] 2025-09-07T09:35:00.9767229Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 33%] 2025-09-07T09:35:00.9767489Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0030s] [ 33%] 2025-09-07T09:35:00.9767750Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 33%] 2025-09-07T09:35:00.9768038Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 33%] 2025-09-07T09:35:00.9768311Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 33%] 2025-09-07T09:35:00.9768572Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 33%] 2025-09-07T09:35:00.9768830Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0028s] [ 33%] 2025-09-07T09:35:00.9769090Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 33%] 2025-09-07T09:35:00.9769349Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 33%] 2025-09-07T09:35:00.9769611Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 33%] 2025-09-07T09:35:00.9769867Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 33%] 2025-09-07T09:35:00.9770128Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 33%] 2025-09-07T09:35:00.9770384Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 33%] 2025-09-07T09:35:00.9770645Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0030s] [ 33%] 2025-09-07T09:35:00.9770921Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 33%] 2025-09-07T09:35:00.9771196Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 33%] 2025-09-07T09:35:00.9772592Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 33%] 2025-09-07T09:35:00.9772852Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 33%] 2025-09-07T09:35:00.9773111Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 33%] 2025-09-07T09:35:00.9773387Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 33%] 2025-09-07T09:35:00.9773660Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 33%] 2025-09-07T09:35:00.9773923Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0032s] [ 33%] 2025-09-07T09:35:00.9774183Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0032s] [ 33%] 2025-09-07T09:35:00.9774443Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 33%] 2025-09-07T09:35:00.9774698Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0031s] [ 33%] 2025-09-07T09:35:00.9774959Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_408_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 33%] 2025-09-07T09:35:00.9775219Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 33%] 2025-09-07T09:35:00.9775481Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 33%] 2025-09-07T09:35:00.9775736Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 33%] 2025-09-07T09:35:00.9776011Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.9776286Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 34%] 2025-09-07T09:35:00.9776623Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 34%] 2025-09-07T09:35:00.9776883Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 34%] 2025-09-07T09:35:00.9777144Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 34%] 2025-09-07T09:35:00.9777427Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 34%] 2025-09-07T09:35:00.9777702Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 34%] 2025-09-07T09:35:00.9779157Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 34%] 2025-09-07T09:35:00.9779421Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0030s] [ 34%] 2025-09-07T09:35:00.9779682Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 34%] 2025-09-07T09:35:00.9779945Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.9780199Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 34%] 2025-09-07T09:35:00.9780455Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 34%] 2025-09-07T09:35:00.9780712Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 34%] 2025-09-07T09:35:00.9780969Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.9781226Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 34%] 2025-09-07T09:35:00.9781528Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 34%] 2025-09-07T09:35:00.9781818Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0030s] [ 34%] 2025-09-07T09:35:00.9782077Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 34%] 2025-09-07T09:35:00.9782331Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 34%] 2025-09-07T09:35:00.9782588Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 34%] 2025-09-07T09:35:00.9782862Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 34%] 2025-09-07T09:35:00.9783135Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 34%] 2025-09-07T09:35:00.9783389Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 34%] 2025-09-07T09:35:00.9783645Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 34%] 2025-09-07T09:35:00.9783898Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 34%] 2025-09-07T09:35:00.9784155Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 34%] 2025-09-07T09:35:00.9784412Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 34%] 2025-09-07T09:35:00.9785814Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 34%] 2025-09-07T09:35:00.9786070Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 34%] 2025-09-07T09:35:00.9786327Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 34%] 2025-09-07T09:35:00.9786691Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 34%] 2025-09-07T09:35:00.9786965Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 34%] 2025-09-07T09:35:00.9787224Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 34%] 2025-09-07T09:35:00.9787482Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 34%] 2025-09-07T09:35:00.9787741Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 34%] 2025-09-07T09:35:00.9788016Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 34%] 2025-09-07T09:35:00.9788289Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 34%] 2025-09-07T09:35:00.9788545Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 34%] 2025-09-07T09:35:00.9788805Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 34%] 2025-09-07T09:35:00.9789065Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 34%] 2025-09-07T09:35:00.9789319Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0030s] [ 34%] 2025-09-07T09:35:00.9789576Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 34%] 2025-09-07T09:35:00.9789835Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 34%] 2025-09-07T09:35:00.9790094Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0032s] [ 34%] 2025-09-07T09:35:00.9790252Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_1_cuda PASSED [0.0008s] [ 34%] 2025-09-07T09:35:00.9790404Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_2_cuda PASSED [0.0008s] [ 34%] 2025-09-07T09:35:00.9790553Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_3_cuda PASSED [0.0007s] [ 34%] 2025-09-07T09:35:00.9791829Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_4_cuda PASSED [0.0008s] [ 35%] 2025-09-07T09:35:00.9792115Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0119s] [ 35%] 2025-09-07T09:35:00.9792392Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 35%] 2025-09-07T09:35:00.9792645Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0086s] [ 35%] 2025-09-07T09:35:00.9792899Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 35%] 2025-09-07T09:35:00.9793171Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0079s] [ 35%] 2025-09-07T09:35:00.9793439Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 35%] 2025-09-07T09:35:00.9793695Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0109s] [ 35%] 2025-09-07T09:35:00.9793953Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 35%] 2025-09-07T09:35:00.9794209Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0108s] [ 35%] 2025-09-07T09:35:00.9794465Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 35%] 2025-09-07T09:35:00.9794719Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0095s] [ 35%] 2025-09-07T09:35:00.9794975Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 35%] 2025-09-07T09:35:00.9795234Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0166s] [ 35%] 2025-09-07T09:35:00.9795494Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 35%] 2025-09-07T09:35:00.9795745Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0052s] [ 35%] 2025-09-07T09:35:00.9796014Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 35%] 2025-09-07T09:35:00.9796279Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0077s] [ 35%] 2025-09-07T09:35:00.9796604Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 35%] 2025-09-07T09:35:00.9796858Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0125s] [ 35%] 2025-09-07T09:35:00.9797117Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 35%] 2025-09-07T09:35:00.9798547Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0106s] [ 35%] 2025-09-07T09:35:00.9798828Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 35%] 2025-09-07T09:35:00.9799080Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0100s] [ 35%] 2025-09-07T09:35:00.9799336Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 35%] 2025-09-07T09:35:00.9799591Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0107s] [ 35%] 2025-09-07T09:35:00.9799848Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 35%] 2025-09-07T09:35:00.9800098Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0089s] [ 35%] 2025-09-07T09:35:00.9800352Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 35%] 2025-09-07T09:35:00.9800606Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0067s] [ 35%] 2025-09-07T09:35:00.9800861Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 35%] 2025-09-07T09:35:00.9801116Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0132s] [ 35%] 2025-09-07T09:35:00.9801412Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 35%] 2025-09-07T09:35:00.9801679Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0125s] [ 35%] 2025-09-07T09:35:00.9801933Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 35%] 2025-09-07T09:35:00.9802184Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0080s] [ 35%] 2025-09-07T09:35:00.9802438Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 35%] 2025-09-07T09:35:00.9802708Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0110s] [ 35%] 2025-09-07T09:35:00.9802976Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 35%] 2025-09-07T09:35:00.9803230Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0101s] [ 35%] 2025-09-07T09:35:00.9803481Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 35%] 2025-09-07T09:35:00.9804857Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0069s] [ 35%] 2025-09-07T09:35:00.9805112Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 35%] 2025-09-07T09:35:00.9805366Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0111s] [ 35%] 2025-09-07T09:35:00.9805623Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 35%] 2025-09-07T09:35:00.9805875Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0106s] [ 35%] 2025-09-07T09:35:00.9806127Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 35%] 2025-09-07T09:35:00.9806376Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0079s] [ 35%] 2025-09-07T09:35:00.9806729Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 36%] 2025-09-07T09:35:00.9807000Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0074s] [ 36%] 2025-09-07T09:35:00.9807255Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 36%] 2025-09-07T09:35:00.9807505Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0074s] [ 36%] 2025-09-07T09:35:00.9807758Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 36%] 2025-09-07T09:35:00.9808048Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0069s] [ 36%] 2025-09-07T09:35:00.9808319Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 36%] 2025-09-07T09:35:00.9808573Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0091s] [ 36%] 2025-09-07T09:35:00.9808831Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 36%] 2025-09-07T09:35:00.9809085Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0079s] [ 36%] 2025-09-07T09:35:00.9809338Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 36%] 2025-09-07T09:35:00.9809589Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0081s] [ 36%] 2025-09-07T09:35:00.9809844Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0039s] [ 36%] 2025-09-07T09:35:00.9810097Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0097s] [ 36%] 2025-09-07T09:35:00.9811479Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 36%] 2025-09-07T09:35:00.9811730Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0089s] [ 36%] 2025-09-07T09:35:00.9812007Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 36%] 2025-09-07T09:35:00.9812274Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0062s] [ 36%] 2025-09-07T09:35:00.9812526Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 36%] 2025-09-07T09:35:00.9812778Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0091s] [ 36%] 2025-09-07T09:35:00.9813033Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 36%] 2025-09-07T09:35:00.9813302Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0089s] [ 36%] 2025-09-07T09:35:00.9813568Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 36%] 2025-09-07T09:35:00.9813818Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0077s] [ 36%] 2025-09-07T09:35:00.9814071Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 36%] 2025-09-07T09:35:00.9814329Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0092s] [ 36%] 2025-09-07T09:35:00.9814588Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 36%] 2025-09-07T09:35:00.9814840Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0098s] [ 36%] 2025-09-07T09:35:00.9815096Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 36%] 2025-09-07T09:35:00.9815349Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0082s] [ 36%] 2025-09-07T09:35:00.9815604Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 36%] 2025-09-07T09:35:00.9815860Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0120s] [ 36%] 2025-09-07T09:35:00.9816138Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 36%] 2025-09-07T09:35:00.9816406Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0113s] [ 36%] 2025-09-07T09:35:00.9817859Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 36%] 2025-09-07T09:35:00.9818113Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0083s] [ 36%] 2025-09-07T09:35:00.9818367Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 36%] 2025-09-07T09:35:00.9818671Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0102s] [ 36%] 2025-09-07T09:35:00.9818950Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 36%] 2025-09-07T09:35:00.9819310Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0094s] [ 36%] 2025-09-07T09:35:00.9819562Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 36%] 2025-09-07T09:35:00.9819815Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0080s] [ 36%] 2025-09-07T09:35:00.9820072Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 36%] 2025-09-07T09:35:00.9820326Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0113s] [ 36%] 2025-09-07T09:35:00.9820582Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 36%] 2025-09-07T09:35:00.9820836Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0119s] [ 36%] 2025-09-07T09:35:00.9821089Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 36%] 2025-09-07T09:35:00.9821338Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0087s] [ 36%] 2025-09-07T09:35:00.9821622Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 37%] 2025-09-07T09:35:00.9821905Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0045s] [ 37%] 2025-09-07T09:35:00.9822168Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 37%] 2025-09-07T09:35:00.9822423Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 37%] 2025-09-07T09:35:00.9822696Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 37%] 2025-09-07T09:35:00.9822967Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0028s] [ 37%] 2025-09-07T09:35:00.9823221Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0022s] [ 37%] 2025-09-07T09:35:00.9824615Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 37%] 2025-09-07T09:35:00.9824879Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 37%] 2025-09-07T09:35:00.9825133Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0098s] [ 37%] 2025-09-07T09:35:00.9825388Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 37%] 2025-09-07T09:35:00.9825644Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0031s] [ 37%] 2025-09-07T09:35:00.9825901Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 37%] 2025-09-07T09:35:00.9826155Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 37%] 2025-09-07T09:35:00.9826412Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 37%] 2025-09-07T09:35:00.9826780Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 37%] 2025-09-07T09:35:00.9827055Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 37%] 2025-09-07T09:35:00.9827307Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0058s] [ 37%] 2025-09-07T09:35:00.9827561Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 37%] 2025-09-07T09:35:00.9827817Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0036s] [ 37%] 2025-09-07T09:35:00.9828098Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 37%] 2025-09-07T09:35:00.9828369Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0080s] [ 37%] 2025-09-07T09:35:00.9828623Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 37%] 2025-09-07T09:35:00.9828876Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 37%] 2025-09-07T09:35:00.9829132Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 37%] 2025-09-07T09:35:00.9829386Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0065s] [ 37%] 2025-09-07T09:35:00.9829641Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 37%] 2025-09-07T09:35:00.9831022Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0061s] [ 37%] 2025-09-07T09:35:00.9831284Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0038s] [ 37%] 2025-09-07T09:35:00.9831535Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0047s] [ 37%] 2025-09-07T09:35:00.9831787Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 37%] 2025-09-07T09:35:00.9832067Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0037s] [ 37%] 2025-09-07T09:35:00.9832337Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 37%] 2025-09-07T09:35:00.9832588Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0071s] [ 37%] 2025-09-07T09:35:00.9832843Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 37%] 2025-09-07T09:35:00.9833094Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 37%] 2025-09-07T09:35:00.9833362Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 37%] 2025-09-07T09:35:00.9833631Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 37%] 2025-09-07T09:35:00.9833885Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 37%] 2025-09-07T09:35:00.9834135Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0068s] [ 37%] 2025-09-07T09:35:00.9834388Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 37%] 2025-09-07T09:35:00.9834639Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0046s] [ 37%] 2025-09-07T09:35:00.9834890Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 37%] 2025-09-07T09:35:00.9835143Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0090s] [ 37%] 2025-09-07T09:35:00.9835400Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 37%] 2025-09-07T09:35:00.9835654Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0084s] [ 37%] 2025-09-07T09:35:00.9835908Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 37%] 2025-09-07T09:35:00.9836174Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0061s] [ 37%] 2025-09-07T09:35:00.9837656Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 38%] 2025-09-07T09:35:00.9837913Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0038s] [ 38%] 2025-09-07T09:35:00.9838169Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 38%] 2025-09-07T09:35:00.9838422Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0032s] [ 38%] 2025-09-07T09:35:00.9838719Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 38%] 2025-09-07T09:35:00.9838989Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 38%] 2025-09-07T09:35:00.9839245Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0022s] [ 38%] 2025-09-07T09:35:00.9839502Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0038s] [ 38%] 2025-09-07T09:35:00.9839760Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 38%] 2025-09-07T09:35:00.9840012Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 38%] 2025-09-07T09:35:00.9840265Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 38%] 2025-09-07T09:35:00.9840516Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0031s] [ 38%] 2025-09-07T09:35:00.9840770Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 38%] 2025-09-07T09:35:00.9841024Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0038s] [ 38%] 2025-09-07T09:35:00.9841281Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 38%] 2025-09-07T09:35:00.9841547Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0032s] [ 38%] 2025-09-07T09:35:00.9841817Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 38%] 2025-09-07T09:35:00.9842066Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 38%] 2025-09-07T09:35:00.9842317Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 38%] 2025-09-07T09:35:00.9842571Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0037s] [ 38%] 2025-09-07T09:35:00.9843963Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 38%] 2025-09-07T09:35:00.9844234Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0033s] [ 38%] 2025-09-07T09:35:00.9844486Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 38%] 2025-09-07T09:35:00.9844739Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0036s] [ 38%] 2025-09-07T09:35:00.9844991Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 38%] 2025-09-07T09:35:00.9845246Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 38%] 2025-09-07T09:35:00.9845502Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0023s] [ 38%] 2025-09-07T09:35:00.9845754Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0074s] [ 38%] 2025-09-07T09:35:00.9846009Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 38%] 2025-09-07T09:35:00.9846262Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0060s] [ 38%] 2025-09-07T09:35:00.9846581Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 38%] 2025-09-07T09:35:00.9846865Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 38%] 2025-09-07T09:35:00.9847144Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 38%] 2025-09-07T09:35:00.9847397Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0098s] [ 38%] 2025-09-07T09:35:00.9847652Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 38%] 2025-09-07T09:35:00.9847905Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0035s] [ 38%] 2025-09-07T09:35:00.9848178Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 38%] 2025-09-07T09:35:00.9848450Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 38%] 2025-09-07T09:35:00.9848705Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 38%] 2025-09-07T09:35:00.9848955Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0030s] [ 38%] 2025-09-07T09:35:00.9849213Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 38%] 2025-09-07T09:35:00.9850590Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0059s] [ 38%] 2025-09-07T09:35:00.9850845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0032s] [ 38%] 2025-09-07T09:35:00.9851099Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0087s] [ 38%] 2025-09-07T09:35:00.9851357Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 38%] 2025-09-07T09:35:00.9851608Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0033s] [ 38%] 2025-09-07T09:35:00.9851860Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 38%] 2025-09-07T09:35:00.9852126Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0066s] [ 38%] 2025-09-07T09:35:00.9852396Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 39%] 2025-09-07T09:35:00.9852655Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.1408s] [ 39%] 2025-09-07T09:35:00.9852912Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0058s] [ 39%] 2025-09-07T09:35:00.9853164Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0062s] [ 39%] 2025-09-07T09:35:00.9853432Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0058s] [ 39%] 2025-09-07T09:35:00.9853697Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0061s] [ 39%] 2025-09-07T09:35:00.9853952Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0054s] [ 39%] 2025-09-07T09:35:00.9854208Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0059s] [ 39%] 2025-09-07T09:35:00.9854472Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0070s] [ 39%] 2025-09-07T09:35:00.9854727Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0069s] [ 39%] 2025-09-07T09:35:00.9854982Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0069s] [ 39%] 2025-09-07T09:35:00.9855235Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0072s] [ 39%] 2025-09-07T09:35:00.9855492Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0067s] [ 39%] 2025-09-07T09:35:00.9857327Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0060s] [ 39%] 2025-09-07T09:35:00.9857591Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0058s] [ 39%] 2025-09-07T09:35:00.9857882Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0104s] [ 39%] 2025-09-07T09:35:00.9858155Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0058s] [ 39%] 2025-09-07T09:35:00.9858411Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0055s] [ 39%] 2025-09-07T09:35:00.9858665Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0050s] [ 39%] 2025-09-07T09:35:00.9858919Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0102s] [ 39%] 2025-09-07T09:35:00.9859254Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0072s] [ 39%] 2025-09-07T09:35:00.9859520Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0076s] [ 39%] 2025-09-07T09:35:00.9859774Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0072s] [ 39%] 2025-09-07T09:35:00.9860027Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0070s] [ 39%] 2025-09-07T09:35:00.9860281Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0061s] [ 39%] 2025-09-07T09:35:00.9860535Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 39%] 2025-09-07T09:35:00.9860792Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 39%] 2025-09-07T09:35:00.9861044Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0036s] [ 39%] 2025-09-07T09:35:00.9861299Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 39%] 2025-09-07T09:35:00.9861551Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0031s] [ 39%] 2025-09-07T09:35:00.9861803Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 39%] 2025-09-07T09:35:00.9862073Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0037s] [ 39%] 2025-09-07T09:35:00.9862343Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0037s] [ 39%] 2025-09-07T09:35:00.9862592Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0041s] [ 39%] 2025-09-07T09:35:00.9864002Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0036s] [ 39%] 2025-09-07T09:35:00.9864277Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0036s] [ 39%] 2025-09-07T09:35:00.9864544Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0036s] [ 39%] 2025-09-07T09:35:00.9864804Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0078s] [ 39%] 2025-09-07T09:35:00.9865061Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0033s] [ 39%] 2025-09-07T09:35:00.9865316Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0037s] [ 39%] 2025-09-07T09:35:00.9865569Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0033s] [ 39%] 2025-09-07T09:35:00.9865818Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0037s] [ 39%] 2025-09-07T09:35:00.9866072Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0033s] [ 39%] 2025-09-07T09:35:00.9866328Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0043s] [ 39%] 2025-09-07T09:35:00.9866643Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0038s] [ 39%] 2025-09-07T09:35:00.9866894Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0042s] [ 39%] 2025-09-07T09:35:00.9867171Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0038s] [ 39%] 2025-09-07T09:35:00.9867439Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0037s] [ 39%] 2025-09-07T09:35:00.9867692Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0038s] [ 40%] 2025-09-07T09:35:00.9867943Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0038s] [ 40%] 2025-09-07T09:35:00.9868199Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 40%] 2025-09-07T09:35:00.9868474Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0041s] [ 40%] 2025-09-07T09:35:00.9868746Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 40%] 2025-09-07T09:35:00.9868998Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0043s] [ 40%] 2025-09-07T09:35:00.9870373Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0030s] [ 40%] 2025-09-07T09:35:00.9870631Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0040s] [ 40%] 2025-09-07T09:35:00.9870889Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 40%] 2025-09-07T09:35:00.9871141Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0040s] [ 40%] 2025-09-07T09:35:00.9871395Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 40%] 2025-09-07T09:35:00.9871649Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0029s] [ 40%] 2025-09-07T09:35:00.9871905Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 40%] 2025-09-07T09:35:00.9872161Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 40%] 2025-09-07T09:35:00.9872431Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 40%] 2025-09-07T09:35:00.9872696Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 40%] 2025-09-07T09:35:00.9872949Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 40%] 2025-09-07T09:35:00.9873198Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0029s] [ 40%] 2025-09-07T09:35:00.9873450Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 40%] 2025-09-07T09:35:00.9873715Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0036s] [ 40%] 2025-09-07T09:35:00.9873990Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 40%] 2025-09-07T09:35:00.9874240Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0036s] [ 40%] 2025-09-07T09:35:00.9874493Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 40%] 2025-09-07T09:35:00.9874744Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0036s] [ 40%] 2025-09-07T09:35:00.9874997Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 40%] 2025-09-07T09:35:00.9875251Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0866s] [ 40%] 2025-09-07T09:35:00.9875508Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0054s] [ 40%] 2025-09-07T09:35:00.9876957Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0057s] [ 40%] 2025-09-07T09:35:00.9877214Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0053s] [ 40%] 2025-09-07T09:35:00.9877468Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0053s] [ 40%] 2025-09-07T09:35:00.9877771Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0050s] [ 40%] 2025-09-07T09:35:00.9878045Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0055s] [ 40%] 2025-09-07T09:35:00.9878303Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0066s] [ 40%] 2025-09-07T09:35:00.9878553Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0065s] [ 40%] 2025-09-07T09:35:00.9878807Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0065s] [ 40%] 2025-09-07T09:35:00.9879082Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0066s] [ 40%] 2025-09-07T09:35:00.9879351Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0062s] [ 40%] 2025-09-07T09:35:00.9879609Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0058s] [ 40%] 2025-09-07T09:35:00.9879866Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0056s] [ 40%] 2025-09-07T09:35:00.9880116Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0056s] [ 40%] 2025-09-07T09:35:00.9880369Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0051s] [ 40%] 2025-09-07T09:35:00.9880619Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0034s] [ 40%] 2025-09-07T09:35:00.9880870Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0033s] [ 40%] 2025-09-07T09:35:00.9881124Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0057s] [ 40%] 2025-09-07T09:35:00.9881380Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0067s] [ 40%] 2025-09-07T09:35:00.9881631Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0067s] [ 40%] 2025-09-07T09:35:00.9881897Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0068s] [ 40%] 2025-09-07T09:35:00.9883285Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0066s] [ 40%] 2025-09-07T09:35:00.9883542Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0060s] [ 41%] 2025-09-07T09:35:00.9883794Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 41%] 2025-09-07T09:35:00.9884050Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 41%] 2025-09-07T09:35:00.9884318Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0032s] [ 41%] 2025-09-07T09:35:00.9884583Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 41%] 2025-09-07T09:35:00.9884834Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0058s] [ 41%] 2025-09-07T09:35:00.9885088Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 41%] 2025-09-07T09:35:00.9885342Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 41%] 2025-09-07T09:35:00.9885597Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 41%] 2025-09-07T09:35:00.9885845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0035s] [ 41%] 2025-09-07T09:35:00.9886098Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 41%] 2025-09-07T09:35:00.9886351Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0029s] [ 41%] 2025-09-07T09:35:00.9886661Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 41%] 2025-09-07T09:35:00.9886912Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 41%] 2025-09-07T09:35:00.9887193Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 41%] 2025-09-07T09:35:00.9887460Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 41%] 2025-09-07T09:35:00.9887712Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 41%] 2025-09-07T09:35:00.9887959Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 41%] 2025-09-07T09:35:00.9888209Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 41%] 2025-09-07T09:35:00.9888484Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0037s] [ 41%] 2025-09-07T09:35:00.9889882Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0032s] [ 41%] 2025-09-07T09:35:00.9890134Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0036s] [ 41%] 2025-09-07T09:35:00.9890388Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 41%] 2025-09-07T09:35:00.9890640Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 41%] 2025-09-07T09:35:00.9890894Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 41%] 2025-09-07T09:35:00.9891144Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 41%] 2025-09-07T09:35:00.9891397Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 41%] 2025-09-07T09:35:00.9891645Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0034s] [ 41%] 2025-09-07T09:35:00.9891894Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 41%] 2025-09-07T09:35:00.9892141Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0030s] [ 41%] 2025-09-07T09:35:00.9892409Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 41%] 2025-09-07T09:35:00.9892695Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 41%] 2025-09-07T09:35:00.9892950Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 41%] 2025-09-07T09:35:00.9893198Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0036s] [ 41%] 2025-09-07T09:35:00.9893448Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 41%] 2025-09-07T09:35:00.9893720Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0031s] [ 41%] 2025-09-07T09:35:00.9893985Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 41%] 2025-09-07T09:35:00.9894235Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0039s] [ 41%] 2025-09-07T09:35:00.9894487Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 41%] 2025-09-07T09:35:00.9894735Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0035s] [ 41%] 2025-09-07T09:35:00.9896085Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 41%] 2025-09-07T09:35:00.9896338Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0031s] [ 41%] 2025-09-07T09:35:00.9897291Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 41%] 2025-09-07T09:35:00.9897544Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0039s] [ 41%] 2025-09-07T09:35:00.9897797Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 41%] 2025-09-07T09:35:00.9898044Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0033s] [ 41%] 2025-09-07T09:35:00.9898327Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 41%] 2025-09-07T09:35:00.9898592Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0032s] [ 41%] 2025-09-07T09:35:00.9898842Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 42%] 2025-09-07T09:35:00.9899148Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0036s] [ 42%] 2025-09-07T09:35:00.9899402Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 42%] 2025-09-07T09:35:00.9899669Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0031s] [ 42%] 2025-09-07T09:35:00.9899935Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 42%] 2025-09-07T09:35:00.9900183Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0039s] [ 42%] 2025-09-07T09:35:00.9900433Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 42%] 2025-09-07T09:35:00.9900686Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0039s] [ 42%] 2025-09-07T09:35:00.9900939Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 42%] 2025-09-07T09:35:00.9901183Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0035s] [ 42%] 2025-09-07T09:35:00.9901437Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 42%] 2025-09-07T09:35:00.9901685Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 42%] 2025-09-07T09:35:00.9901934Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 42%] 2025-09-07T09:35:00.9903406Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 42%] 2025-09-07T09:35:00.9903679Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 42%] 2025-09-07T09:35:00.9903940Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0036s] [ 42%] 2025-09-07T09:35:00.9904188Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 42%] 2025-09-07T09:35:00.9904434Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0028s] [ 42%] 2025-09-07T09:35:00.9904681Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 42%] 2025-09-07T09:35:00.9904954Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0036s] [ 42%] 2025-09-07T09:35:00.9905220Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 42%] 2025-09-07T09:35:00.9905469Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0035s] [ 42%] 2025-09-07T09:35:00.9905719Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 42%] 2025-09-07T09:35:00.9905969Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 42%] 2025-09-07T09:35:00.9906219Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 42%] 2025-09-07T09:35:00.9906470Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 42%] 2025-09-07T09:35:00.9906796Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 42%] 2025-09-07T09:35:00.9907047Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0024s] [ 42%] 2025-09-07T09:35:00.9907299Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 42%] 2025-09-07T09:35:00.9907549Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 42%] 2025-09-07T09:35:00.9907833Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 42%] 2025-09-07T09:35:00.9908103Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 42%] 2025-09-07T09:35:00.9908358Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 42%] 2025-09-07T09:35:00.9909876Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0037s] [ 42%] 2025-09-07T09:35:00.9910137Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 42%] 2025-09-07T09:35:00.9910418Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0031s] [ 42%] 2025-09-07T09:35:00.9910691Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 42%] 2025-09-07T09:35:00.9910940Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 42%] 2025-09-07T09:35:00.9911195Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 42%] 2025-09-07T09:35:00.9911444Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0031s] [ 42%] 2025-09-07T09:35:00.9911694Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 42%] 2025-09-07T09:35:00.9911941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0028s] [ 42%] 2025-09-07T09:35:00.9912192Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 42%] 2025-09-07T09:35:00.9912443Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 42%] 2025-09-07T09:35:00.9912697Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 42%] 2025-09-07T09:35:00.9912946Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 42%] 2025-09-07T09:35:00.9913211Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 42%] 2025-09-07T09:35:00.9913471Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0034s] [ 42%] 2025-09-07T09:35:00.9913722Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 43%] 2025-09-07T09:35:00.9913979Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 43%] 2025-09-07T09:35:00.9914239Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0035s] [ 43%] 2025-09-07T09:35:00.9914507Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0032s] [ 43%] 2025-09-07T09:35:00.9914774Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 43%] 2025-09-07T09:35:00.9915031Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0028s] [ 43%] 2025-09-07T09:35:00.9916613Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 43%] 2025-09-07T09:35:00.9916877Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0102s] [ 43%] 2025-09-07T09:35:00.9917136Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 43%] 2025-09-07T09:35:00.9917391Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0038s] [ 43%] 2025-09-07T09:35:00.9917646Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 43%] 2025-09-07T09:35:00.9917898Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0032s] [ 43%] 2025-09-07T09:35:00.9918154Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 43%] 2025-09-07T09:35:00.9918407Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 43%] 2025-09-07T09:35:00.9918703Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 43%] 2025-09-07T09:35:00.9918974Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0030s] [ 43%] 2025-09-07T09:35:00.9919226Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 43%] 2025-09-07T09:35:00.9919476Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 43%] 2025-09-07T09:35:00.9919729Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 43%] 2025-09-07T09:35:00.9920003Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 43%] 2025-09-07T09:35:00.9920275Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 43%] 2025-09-07T09:35:00.9920527Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0034s] [ 43%] 2025-09-07T09:35:00.9920783Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 43%] 2025-09-07T09:35:00.9921036Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0032s] [ 43%] 2025-09-07T09:35:00.9921291Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 43%] 2025-09-07T09:35:00.9921542Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0037s] [ 43%] 2025-09-07T09:35:00.9922961Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0023s] [ 43%] 2025-09-07T09:35:00.9923218Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0023s] [ 43%] 2025-09-07T09:35:00.9923471Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 43%] 2025-09-07T09:35:00.9923721Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0051s] [ 43%] 2025-09-07T09:35:00.9923994Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 43%] 2025-09-07T09:35:00.9924263Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 43%] 2025-09-07T09:35:00.9924521Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 43%] 2025-09-07T09:35:00.9924772Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0038s] [ 43%] 2025-09-07T09:35:00.9925025Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 43%] 2025-09-07T09:35:00.9925297Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0056s] [ 43%] 2025-09-07T09:35:00.9925561Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 43%] 2025-09-07T09:35:00.9925815Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 43%] 2025-09-07T09:35:00.9926067Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 43%] 2025-09-07T09:35:00.9926321Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 43%] 2025-09-07T09:35:00.9926643Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 43%] 2025-09-07T09:35:00.9926891Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 43%] 2025-09-07T09:35:00.9927143Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 43%] 2025-09-07T09:35:00.9927395Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 43%] 2025-09-07T09:35:00.9927651Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 43%] 2025-09-07T09:35:00.9927898Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0039s] [ 43%] 2025-09-07T09:35:00.9928190Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 43%] 2025-09-07T09:35:00.9929600Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0036s] [ 43%] 2025-09-07T09:35:00.9929857Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 44%] 2025-09-07T09:35:00.9930110Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 44%] 2025-09-07T09:35:00.9930365Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0023s] [ 44%] 2025-09-07T09:35:00.9930654Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 44%] 2025-09-07T09:35:00.9930922Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 44%] 2025-09-07T09:35:00.9931173Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 44%] 2025-09-07T09:35:00.9931423Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0022s] [ 44%] 2025-09-07T09:35:00.9931678Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 44%] 2025-09-07T09:35:00.9931936Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 44%] 2025-09-07T09:35:00.9932186Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0038s] [ 44%] 2025-09-07T09:35:00.9932439Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 44%] 2025-09-07T09:35:00.9932690Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0041s] [ 44%] 2025-09-07T09:35:00.9932944Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 44%] 2025-09-07T09:35:00.9933194Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 44%] 2025-09-07T09:35:00.9933463Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 44%] 2025-09-07T09:35:00.9933723Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0031s] [ 44%] 2025-09-07T09:35:00.9933975Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 44%] 2025-09-07T09:35:00.9934224Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 44%] 2025-09-07T09:35:00.9934473Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 44%] 2025-09-07T09:35:00.9935861Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0036s] [ 44%] 2025-09-07T09:35:00.9936131Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 44%] 2025-09-07T09:35:00.9936380Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0039s] [ 44%] 2025-09-07T09:35:00.9936717Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 44%] 2025-09-07T09:35:00.9936968Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0035s] [ 44%] 2025-09-07T09:35:00.9937220Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 44%] 2025-09-07T09:35:00.9937476Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 44%] 2025-09-07T09:35:00.9937733Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 44%] 2025-09-07T09:35:00.9937984Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0031s] [ 44%] 2025-09-07T09:35:00.9938237Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 44%] 2025-09-07T09:35:00.9938488Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 44%] 2025-09-07T09:35:00.9938775Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 44%] 2025-09-07T09:35:00.9939135Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0088s] [ 44%] 2025-09-07T09:35:00.9939393Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 44%] 2025-09-07T09:35:00.9939645Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0034s] [ 44%] 2025-09-07T09:35:00.9939944Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 44%] 2025-09-07T09:35:00.9940229Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0072s] [ 44%] 2025-09-07T09:35:00.9940481Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 44%] 2025-09-07T09:35:00.9940734Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0071s] [ 44%] 2025-09-07T09:35:00.9940989Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 44%] 2025-09-07T09:35:00.9941239Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0033s] [ 44%] 2025-09-07T09:35:00.9942666Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 44%] 2025-09-07T09:35:00.9942918Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0028s] [ 44%] 2025-09-07T09:35:00.9943176Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 44%] 2025-09-07T09:35:00.9943428Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0041s] [ 44%] 2025-09-07T09:35:00.9943682Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 44%] 2025-09-07T09:35:00.9943957Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0090s] [ 44%] 2025-09-07T09:35:00.9944223Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 44%] 2025-09-07T09:35:00.9944471Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0035s] [ 44%] 2025-09-07T09:35:00.9944721Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 45%] 2025-09-07T09:35:00.9944977Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 45%] 2025-09-07T09:35:00.9945251Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0023s] [ 45%] 2025-09-07T09:35:00.9945523Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 45%] 2025-09-07T09:35:00.9945777Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 45%] 2025-09-07T09:35:00.9946030Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 45%] 2025-09-07T09:35:00.9946285Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 45%] 2025-09-07T09:35:00.9946614Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 45%] 2025-09-07T09:35:00.9946870Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 45%] 2025-09-07T09:35:00.9947125Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0026s] [ 45%] 2025-09-07T09:35:00.9947382Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 45%] 2025-09-07T09:35:00.9947637Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 45%] 2025-09-07T09:35:00.9949043Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 45%] 2025-09-07T09:35:00.9949340Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.9949613Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.9949863Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.9950117Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.9950367Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0023s] [ 45%] 2025-09-07T09:35:00.9950640Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 45%] 2025-09-07T09:35:00.9950911Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 45%] 2025-09-07T09:35:00.9951167Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 45%] 2025-09-07T09:35:00.9951417Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0027s] [ 45%] 2025-09-07T09:35:00.9951670Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 45%] 2025-09-07T09:35:00.9951920Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 45%] 2025-09-07T09:35:00.9952171Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 45%] 2025-09-07T09:35:00.9952424Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 45%] 2025-09-07T09:35:00.9952680Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0023s] [ 45%] 2025-09-07T09:35:00.9952928Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0031s] [ 45%] 2025-09-07T09:35:00.9953182Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 45%] 2025-09-07T09:35:00.9953450Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 45%] 2025-09-07T09:35:00.9953718Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0022s] [ 45%] 2025-09-07T09:35:00.9953971Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 45%] 2025-09-07T09:35:00.9954226Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 45%] 2025-09-07T09:35:00.9955604Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0035s] [ 45%] 2025-09-07T09:35:00.9955875Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 45%] 2025-09-07T09:35:00.9956138Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 45%] 2025-09-07T09:35:00.9956392Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.9956704Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 45%] 2025-09-07T09:35:00.9956956Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 45%] 2025-09-07T09:35:00.9957204Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0034s] [ 45%] 2025-09-07T09:35:00.9957453Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 45%] 2025-09-07T09:35:00.9957700Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0028s] [ 45%] 2025-09-07T09:35:00.9957953Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 45%] 2025-09-07T09:35:00.9958204Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 45%] 2025-09-07T09:35:00.9958457Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0037s] [ 45%] 2025-09-07T09:35:00.9958730Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0037s] [ 45%] 2025-09-07T09:35:00.9958998Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 45%] 2025-09-07T09:35:00.9959247Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0031s] [ 45%] 2025-09-07T09:35:00.9959498Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 46%] 2025-09-07T09:35:00.9959749Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 46%] 2025-09-07T09:35:00.9960019Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 46%] 2025-09-07T09:35:00.9960283Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0034s] [ 46%] 2025-09-07T09:35:00.9960533Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 46%] 2025-09-07T09:35:00.9961918Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 46%] 2025-09-07T09:35:00.9962178Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 46%] 2025-09-07T09:35:00.9962431Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 46%] 2025-09-07T09:35:00.9962685Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 46%] 2025-09-07T09:35:00.9962934Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0035s] [ 46%] 2025-09-07T09:35:00.9963186Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 46%] 2025-09-07T09:35:00.9963435Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 46%] 2025-09-07T09:35:00.9963688Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 46%] 2025-09-07T09:35:00.9963954Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 46%] 2025-09-07T09:35:00.9964222Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 46%] 2025-09-07T09:35:00.9964471Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0035s] [ 46%] 2025-09-07T09:35:00.9964721Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 46%] 2025-09-07T09:35:00.9964968Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 46%] 2025-09-07T09:35:00.9965236Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 46%] 2025-09-07T09:35:00.9965498Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 46%] 2025-09-07T09:35:00.9965751Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 46%] 2025-09-07T09:35:00.9966001Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0035s] [ 46%] 2025-09-07T09:35:00.9966252Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 46%] 2025-09-07T09:35:00.9966581Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 46%] 2025-09-07T09:35:00.9966831Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 46%] 2025-09-07T09:35:00.9967086Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 46%] 2025-09-07T09:35:00.9968469Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0023s] [ 46%] 2025-09-07T09:35:00.9968722Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0023s] [ 46%] 2025-09-07T09:35:00.9968974Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 46%] 2025-09-07T09:35:00.9969258Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 46%] 2025-09-07T09:35:00.9969528Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 46%] 2025-09-07T09:35:00.9969781Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 46%] 2025-09-07T09:35:00.9970038Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 46%] 2025-09-07T09:35:00.9970287Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0035s] [ 46%] 2025-09-07T09:35:00.9970556Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 46%] 2025-09-07T09:35:00.9970820Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 46%] 2025-09-07T09:35:00.9971072Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 46%] 2025-09-07T09:35:00.9971324Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 46%] 2025-09-07T09:35:00.9971579Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 46%] 2025-09-07T09:35:00.9971827Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0024s] [ 46%] 2025-09-07T09:35:00.9972077Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 46%] 2025-09-07T09:35:00.9972326Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0023s] [ 46%] 2025-09-07T09:35:00.9972577Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 46%] 2025-09-07T09:35:00.9972831Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 46%] 2025-09-07T09:35:00.9973115Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 46%] 2025-09-07T09:35:00.9973385Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0027s] [ 46%] 2025-09-07T09:35:00.9975138Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 46%] 2025-09-07T09:35:00.9975404Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0035s] [ 46%] 2025-09-07T09:35:00.9975654Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 47%] 2025-09-07T09:35:00.9975911Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 47%] 2025-09-07T09:35:00.9976188Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 47%] 2025-09-07T09:35:00.9976453Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 47%] 2025-09-07T09:35:00.9976774Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 47%] 2025-09-07T09:35:00.9977027Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 47%] 2025-09-07T09:35:00.9977281Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 47%] 2025-09-07T09:35:00.9977538Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 47%] 2025-09-07T09:35:00.9977794Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 47%] 2025-09-07T09:35:00.9978049Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 47%] 2025-09-07T09:35:00.9978305Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 47%] 2025-09-07T09:35:00.9978558Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 47%] 2025-09-07T09:35:00.9978812Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 47%] 2025-09-07T09:35:00.9979166Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 47%] 2025-09-07T09:35:00.9979439Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 47%] 2025-09-07T09:35:00.9979694Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 47%] 2025-09-07T09:35:00.9979948Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 47%] 2025-09-07T09:35:00.9980198Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0033s] [ 47%] 2025-09-07T09:35:00.9981792Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 47%] 2025-09-07T09:35:00.9982070Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 47%] 2025-09-07T09:35:00.9982329Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 47%] 2025-09-07T09:35:00.9982582Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0030s] [ 47%] 2025-09-07T09:35:00.9982835Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 47%] 2025-09-07T09:35:00.9983087Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0029s] [ 47%] 2025-09-07T09:35:00.9983340Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 47%] 2025-09-07T09:35:00.9983592Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 47%] 2025-09-07T09:35:00.9983848Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 47%] 2025-09-07T09:35:00.9984100Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0035s] [ 47%] 2025-09-07T09:35:00.9984351Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 47%] 2025-09-07T09:35:00.9984616Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0023s] [ 47%] 2025-09-07T09:35:00.9984888Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 47%] 2025-09-07T09:35:00.9985142Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 47%] 2025-09-07T09:35:00.9985398Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 47%] 2025-09-07T09:35:00.9985649Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 47%] 2025-09-07T09:35:00.9985917Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 47%] 2025-09-07T09:35:00.9986183Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 47%] 2025-09-07T09:35:00.9986437Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 47%] 2025-09-07T09:35:00.9986837Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 47%] 2025-09-07T09:35:00.9988309Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 47%] 2025-09-07T09:35:00.9988564Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0032s] [ 47%] 2025-09-07T09:35:00.9988817Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 47%] 2025-09-07T09:35:00.9989068Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 47%] 2025-09-07T09:35:00.9989320Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 47%] 2025-09-07T09:35:00.9989573Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 47%] 2025-09-07T09:35:00.9989827Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 47%] 2025-09-07T09:35:00.9990118Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0028s] [ 47%] 2025-09-07T09:35:00.9990387Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 47%] 2025-09-07T09:35:00.9990637Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 47%] 2025-09-07T09:35:00.9990888Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 48%] 2025-09-07T09:35:00.9991141Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 48%] 2025-09-07T09:35:00.9991413Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 48%] 2025-09-07T09:35:00.9991679Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0033s] [ 48%] 2025-09-07T09:35:00.9991936Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 48%] 2025-09-07T09:35:00.9992190Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0022s] [ 48%] 2025-09-07T09:35:00.9992443Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0022s] [ 48%] 2025-09-07T09:35:00.9992696Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 48%] 2025-09-07T09:35:00.9992950Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:00.9993200Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 48%] 2025-09-07T09:35:00.9993454Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:00.9994823Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0026s] [ 48%] 2025-09-07T09:35:00.9995076Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 48%] 2025-09-07T09:35:00.9995347Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 48%] 2025-09-07T09:35:00.9995614Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 48%] 2025-09-07T09:35:00.9995862Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0030s] [ 48%] 2025-09-07T09:35:00.9996111Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 48%] 2025-09-07T09:35:00.9996359Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0028s] [ 48%] 2025-09-07T09:35:00.9996697Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 48%] 2025-09-07T09:35:00.9996966Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 48%] 2025-09-07T09:35:00.9997223Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 48%] 2025-09-07T09:35:00.9997474Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0028s] [ 48%] 2025-09-07T09:35:00.9997729Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 48%] 2025-09-07T09:35:00.9997979Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:00.9998229Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:00.9998484Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 48%] 2025-09-07T09:35:00.9998741Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 48%] 2025-09-07T09:35:00.9998992Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 48%] 2025-09-07T09:35:00.9999245Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 48%] 2025-09-07T09:35:00.9999526Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0023s] [ 48%] 2025-09-07T09:35:00.9999797Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 48%] 2025-09-07T09:35:01.0001170Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 48%] 2025-09-07T09:35:01.0001430Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:01.0001681Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0028s] [ 48%] 2025-09-07T09:35:01.0001958Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 48%] 2025-09-07T09:35:01.0002222Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0026s] [ 48%] 2025-09-07T09:35:01.0002475Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 48%] 2025-09-07T09:35:01.0002730Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 48%] 2025-09-07T09:35:01.0002988Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 48%] 2025-09-07T09:35:01.0003238Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 48%] 2025-09-07T09:35:01.0003489Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 48%] 2025-09-07T09:35:01.0003739Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 48%] 2025-09-07T09:35:01.0003991Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 48%] 2025-09-07T09:35:01.0004242Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 48%] 2025-09-07T09:35:01.0004497Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 48%] 2025-09-07T09:35:01.0004770Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0035s] [ 48%] 2025-09-07T09:35:01.0005040Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 48%] 2025-09-07T09:35:01.0005293Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0029s] [ 48%] 2025-09-07T09:35:01.0005544Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 49%] 2025-09-07T09:35:01.0005797Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 49%] 2025-09-07T09:35:01.0006068Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0023s] [ 49%] 2025-09-07T09:35:01.0006327Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0031s] [ 49%] 2025-09-07T09:35:01.0007808Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 49%] 2025-09-07T09:35:01.0008061Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 49%] 2025-09-07T09:35:01.0008313Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 49%] 2025-09-07T09:35:01.0008569Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 49%] 2025-09-07T09:35:01.0008823Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 49%] 2025-09-07T09:35:01.0009073Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 49%] 2025-09-07T09:35:01.0009326Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:01.0009577Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 49%] 2025-09-07T09:35:01.0009829Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:01.0010118Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 49%] 2025-09-07T09:35:01.0010387Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 49%] 2025-09-07T09:35:01.0010637Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 49%] 2025-09-07T09:35:01.0010890Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:01.0011138Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0023s] [ 49%] 2025-09-07T09:35:01.0011405Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 49%] 2025-09-07T09:35:01.0011673Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 49%] 2025-09-07T09:35:01.0011926Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 49%] 2025-09-07T09:35:01.0012172Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0035s] [ 49%] 2025-09-07T09:35:01.0012424Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 49%] 2025-09-07T09:35:01.0012671Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0031s] [ 49%] 2025-09-07T09:35:01.0014017Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:01.0014274Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 49%] 2025-09-07T09:35:01.0014529Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0023s] [ 49%] 2025-09-07T09:35:01.0014778Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0032s] [ 49%] 2025-09-07T09:35:01.0015028Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 49%] 2025-09-07T09:35:01.0015294Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0036s] [ 49%] 2025-09-07T09:35:01.0015557Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 49%] 2025-09-07T09:35:01.0015809Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 49%] 2025-09-07T09:35:01.0016060Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 49%] 2025-09-07T09:35:01.0016311Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0036s] [ 49%] 2025-09-07T09:35:01.0016793Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 49%] 2025-09-07T09:35:01.0017057Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:01.0017307Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 49%] 2025-09-07T09:35:01.0017557Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 49%] 2025-09-07T09:35:01.0017809Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 49%] 2025-09-07T09:35:01.0018055Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:01.0018303Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 49%] 2025-09-07T09:35:01.0018548Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0028s] [ 49%] 2025-09-07T09:35:01.0018801Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 49%] 2025-09-07T09:35:01.0019091Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0039s] [ 49%] 2025-09-07T09:35:01.0019342Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 49%] 2025-09-07T09:35:01.0020727Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 49%] 2025-09-07T09:35:01.0020996Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 49%] 2025-09-07T09:35:01.0021244Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0031s] [ 50%] 2025-09-07T09:35:01.0021493Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 50%] 2025-09-07T09:35:01.0021744Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 50%] 2025-09-07T09:35:01.0022015Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 50%] 2025-09-07T09:35:01.0022280Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0035s] [ 50%] 2025-09-07T09:35:01.0022534Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 50%] 2025-09-07T09:35:01.0022781Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 50%] 2025-09-07T09:35:01.0023030Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0022s] [ 50%] 2025-09-07T09:35:01.0023278Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 50%] 2025-09-07T09:35:01.0023530Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 50%] 2025-09-07T09:35:01.0023778Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0036s] [ 50%] 2025-09-07T09:35:01.0024029Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 50%] 2025-09-07T09:35:01.0024280Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0026s] [ 50%] 2025-09-07T09:35:01.0024529Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 50%] 2025-09-07T09:35:01.0024795Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 50%] 2025-09-07T09:35:01.0025060Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 50%] 2025-09-07T09:35:01.0025307Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 50%] 2025-09-07T09:35:01.0025554Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 50%] 2025-09-07T09:35:01.0026964Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0028s] [ 50%] 2025-09-07T09:35:01.0027255Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 50%] 2025-09-07T09:35:01.0027529Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0036s] [ 50%] 2025-09-07T09:35:01.0027782Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 50%] 2025-09-07T09:35:01.0028029Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0027s] [ 50%] 2025-09-07T09:35:01.0028278Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 50%] 2025-09-07T09:35:01.0028524Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0031s] [ 50%] 2025-09-07T09:35:01.0028771Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 50%] 2025-09-07T09:35:01.0029021Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 50%] 2025-09-07T09:35:01.0029276Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0023s] [ 50%] 2025-09-07T09:35:01.0029524Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0031s] [ 50%] 2025-09-07T09:35:01.0029774Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 50%] 2025-09-07T09:35:01.0030052Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 50%] 2025-09-07T09:35:01.0030320Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0022s] [ 50%] 2025-09-07T09:35:01.0030570Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 50%] 2025-09-07T09:35:01.0030821Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 50%] 2025-09-07T09:35:01.0031068Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0032s] [ 50%] 2025-09-07T09:35:01.0031330Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 50%] 2025-09-07T09:35:01.0031589Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 50%] 2025-09-07T09:35:01.0031838Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 50%] 2025-09-07T09:35:01.0032088Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 50%] 2025-09-07T09:35:01.0033455Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 50%] 2025-09-07T09:35:01.0033703Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 50%] 2025-09-07T09:35:01.0033949Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 50%] 2025-09-07T09:35:01.0034194Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0023s] [ 50%] 2025-09-07T09:35:01.0034442Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 50%] 2025-09-07T09:35:01.0034690Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 50%] 2025-09-07T09:35:01.0034942Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 50%] 2025-09-07T09:35:01.0035212Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0037s] [ 50%] 2025-09-07T09:35:01.0035475Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 50%] 2025-09-07T09:35:01.0035722Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 51%] 2025-09-07T09:35:01.0035969Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 51%] 2025-09-07T09:35:01.0036226Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0670s] [ 51%] 2025-09-07T09:35:01.0036578Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0061s] [ 51%] 2025-09-07T09:35:01.0036850Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0060s] [ 51%] 2025-09-07T09:35:01.0037103Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0060s] [ 51%] 2025-09-07T09:35:01.0037356Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0061s] [ 51%] 2025-09-07T09:35:01.0037616Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0055s] [ 51%] 2025-09-07T09:35:01.0037954Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0037s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0038289Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0038619Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0040061Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0040391Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0027s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0040740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0041020Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 51%] 2025-09-07T09:35:01.0041279Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0054s] [ 51%] 2025-09-07T09:35:01.0041533Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0060s] [ 51%] 2025-09-07T09:35:01.0041807Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0060s] [ 51%] 2025-09-07T09:35:01.0042074Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0066s] [ 51%] 2025-09-07T09:35:01.0042328Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0056s] [ 51%] 2025-09-07T09:35:01.0042655Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0032s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0042987Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0053s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0043315Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0043641Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0043967Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0044292Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0044548Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 51%] 2025-09-07T09:35:01.0044816Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 51%] 2025-09-07T09:35:01.0045083Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0034s] [ 51%] 2025-09-07T09:35:01.0045339Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 51%] 2025-09-07T09:35:01.0045593Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0031s] [ 51%] 2025-09-07T09:35:01.0045847Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0033s] [ 51%] 2025-09-07T09:35:01.0047377Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0047725Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0048051Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0048378Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0048704Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0049033Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0049293Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 51%] 2025-09-07T09:35:01.0049549Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0032s] [ 51%] 2025-09-07T09:35:01.0049800Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0032s] [ 51%] 2025-09-07T09:35:01.0050068Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 51%] 2025-09-07T09:35:01.0050334Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0048s] [ 51%] 2025-09-07T09:35:01.0050587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0034s] [ 51%] 2025-09-07T09:35:01.0050914Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0051243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0051583Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0051919Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 51%] 2025-09-07T09:35:01.0052242Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0052567Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0052821Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 52%] 2025-09-07T09:35:01.0053078Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 52%] 2025-09-07T09:35:01.0054434Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0033s] [ 52%] 2025-09-07T09:35:01.0054691Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 52%] 2025-09-07T09:35:01.0054944Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0033s] [ 52%] 2025-09-07T09:35:01.0055196Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 52%] 2025-09-07T09:35:01.0055542Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0055883Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0056209Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0056603Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0056944Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0057268Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0057524Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 52%] 2025-09-07T09:35:01.0057782Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 52%] 2025-09-07T09:35:01.0058033Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0036s] [ 52%] 2025-09-07T09:35:01.0058286Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 52%] 2025-09-07T09:35:01.0058537Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0036s] [ 52%] 2025-09-07T09:35:01.0058793Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 52%] 2025-09-07T09:35:01.0059175Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0059516Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0059861Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0060185Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0061628Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0061971Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0062243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0547s] [ 52%] 2025-09-07T09:35:01.0062501Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0054s] [ 52%] 2025-09-07T09:35:01.0062753Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0053s] [ 52%] 2025-09-07T09:35:01.0063011Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0053s] [ 52%] 2025-09-07T09:35:01.0063267Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0054s] [ 52%] 2025-09-07T09:35:01.0063519Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0050s] [ 52%] 2025-09-07T09:35:01.0063848Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0033s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0064176Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0064502Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0064842Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0026s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0065180Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0065504Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0065756Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 52%] 2025-09-07T09:35:01.0066012Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0049s] [ 52%] 2025-09-07T09:35:01.0066278Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0050s] [ 52%] 2025-09-07T09:35:01.0066611Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0051s] [ 52%] 2025-09-07T09:35:01.0066862Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0040s] [ 52%] 2025-09-07T09:35:01.0067114Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0034s] [ 52%] 2025-09-07T09:35:01.0068551Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0029s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0068880Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0040s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0069205Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0069532Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 52%] 2025-09-07T09:35:01.0069857Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0070224Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0070498Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 53%] 2025-09-07T09:35:01.0070758Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:01.0071011Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:01.0071266Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:01.0071538Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0031s] [ 53%] 2025-09-07T09:35:01.0071811Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 53%] 2025-09-07T09:35:01.0072142Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0072475Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0072802Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0073128Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0073454Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0073778Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0074037Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 53%] 2025-09-07T09:35:01.0074295Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 53%] 2025-09-07T09:35:01.0074564Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 53%] 2025-09-07T09:35:01.0075918Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 53%] 2025-09-07T09:35:01.0076179Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0028s] [ 53%] 2025-09-07T09:35:01.0076433Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 53%] 2025-09-07T09:35:01.0076876Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0077222Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0077547Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0077872Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0078199Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0078524Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0078781Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 53%] 2025-09-07T09:35:01.0079040Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:01.0079291Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:01.0079544Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:01.0079817Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:01.0080092Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 53%] 2025-09-07T09:35:01.0080420Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0080749Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0081086Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0081421Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0081745Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0083171Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0083433Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 53%] 2025-09-07T09:35:01.0083688Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 53%] 2025-09-07T09:35:01.0083941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0030s] [ 53%] 2025-09-07T09:35:01.0084199Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:01.0084449Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0029s] [ 53%] 2025-09-07T09:35:01.0084701Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 53%] 2025-09-07T09:35:01.0085043Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0085382Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0085708Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0086032Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 53%] 2025-09-07T09:35:01.0086374Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0086782Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0087037Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 54%] 2025-09-07T09:35:01.0087292Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0022s] [ 54%] 2025-09-07T09:35:01.0087544Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0031s] [ 54%] 2025-09-07T09:35:01.0087797Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:01.0088052Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0023s] [ 54%] 2025-09-07T09:35:01.0088307Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:01.0088634Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0090061Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0090422Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0012s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0090766Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0021s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0091090Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0091417Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0091689Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 54%] 2025-09-07T09:35:01.0091959Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 54%] 2025-09-07T09:35:01.0092209Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0040s] [ 54%] 2025-09-07T09:35:01.0092461Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 54%] 2025-09-07T09:35:01.0092712Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0033s] [ 54%] 2025-09-07T09:35:01.0092963Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 54%] 2025-09-07T09:35:01.0093289Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0093617Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0093939Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0094261Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0094597Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0094932Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0095187Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 54%] 2025-09-07T09:35:01.0095443Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 54%] 2025-09-07T09:35:01.0095696Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 54%] 2025-09-07T09:35:01.0095965Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 54%] 2025-09-07T09:35:01.0097410Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0031s] [ 54%] 2025-09-07T09:35:01.0097665Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 54%] 2025-09-07T09:35:01.0097994Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0098325Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0098651Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0099028Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0099356Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0099682Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0099973Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 54%] 2025-09-07T09:35:01.0100246Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 54%] 2025-09-07T09:35:01.0100498Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0077s] [ 54%] 2025-09-07T09:35:01.0100751Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 54%] 2025-09-07T09:35:01.0101001Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 54%] 2025-09-07T09:35:01.0101270Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 54%] 2025-09-07T09:35:01.0101614Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0101939Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0102263Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0102587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 54%] 2025-09-07T09:35:01.0102908Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0103231Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0104597Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0436s] [ 55%] 2025-09-07T09:35:01.0104867Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0059s] [ 55%] 2025-09-07T09:35:01.0105120Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0058s] [ 55%] 2025-09-07T09:35:01.0105392Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0058s] [ 55%] 2025-09-07T09:35:01.0105659Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0067s] [ 55%] 2025-09-07T09:35:01.0105913Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0069s] [ 55%] 2025-09-07T09:35:01.0106244Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0034s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0106654Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0106998Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0107324Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0027s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0107650Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0107976Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0108231Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0038s] [ 55%] 2025-09-07T09:35:01.0108488Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0060s] [ 55%] 2025-09-07T09:35:01.0108743Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0058s] [ 55%] 2025-09-07T09:35:01.0108997Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0059s] [ 55%] 2025-09-07T09:35:01.0109250Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0065s] [ 55%] 2025-09-07T09:35:01.0109530Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0063s] [ 55%] 2025-09-07T09:35:01.0109875Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0029s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0110203Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0046s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0111635Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0111985Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0112321Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0112649Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0112908Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 55%] 2025-09-07T09:35:01.0113166Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 55%] 2025-09-07T09:35:01.0113417Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0031s] [ 55%] 2025-09-07T09:35:01.0113672Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 55%] 2025-09-07T09:35:01.0113927Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0031s] [ 55%] 2025-09-07T09:35:01.0114180Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 55%] 2025-09-07T09:35:01.0114506Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0114847Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0115197Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0115521Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0115845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0116191Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0116459Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0037s] [ 55%] 2025-09-07T09:35:01.0116781Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0033s] [ 55%] 2025-09-07T09:35:01.0117032Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0032s] [ 55%] 2025-09-07T09:35:01.0117289Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0034s] [ 55%] 2025-09-07T09:35:01.0117539Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0033s] [ 55%] 2025-09-07T09:35:01.0118905Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0033s] [ 55%] 2025-09-07T09:35:01.0119234Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0119560Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0119886Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0120243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 55%] 2025-09-07T09:35:01.0120586Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0120911Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0121166Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 56%] 2025-09-07T09:35:01.0121440Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0032s] [ 56%] 2025-09-07T09:35:01.0121706Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0032s] [ 56%] 2025-09-07T09:35:01.0121959Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 56%] 2025-09-07T09:35:01.0122211Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 56%] 2025-09-07T09:35:01.0122466Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 56%] 2025-09-07T09:35:01.0122794Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0123119Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0123444Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0123770Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0124093Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0124428Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0124697Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 56%] 2025-09-07T09:35:01.0126055Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0032s] [ 56%] 2025-09-07T09:35:01.0126307Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0033s] [ 56%] 2025-09-07T09:35:01.0126629Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0034s] [ 56%] 2025-09-07T09:35:01.0126904Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0034s] [ 56%] 2025-09-07T09:35:01.0127173Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0033s] [ 56%] 2025-09-07T09:35:01.0127499Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0127827Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0128152Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0128477Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0128800Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0129122Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0129378Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0672s] [ 56%] 2025-09-07T09:35:01.0129652Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0058s] [ 56%] 2025-09-07T09:35:01.0129923Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0057s] [ 56%] 2025-09-07T09:35:01.0130178Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0057s] [ 56%] 2025-09-07T09:35:01.0130428Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0063s] [ 56%] 2025-09-07T09:35:01.0130683Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0065s] [ 56%] 2025-09-07T09:35:01.0131024Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0034s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0131363Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0131688Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0133111Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0028s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0133440Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0133766Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0134023Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0043s] [ 56%] 2025-09-07T09:35:01.0134279Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0058s] [ 56%] 2025-09-07T09:35:01.0134532Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0056s] [ 56%] 2025-09-07T09:35:01.0134785Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0055s] [ 56%] 2025-09-07T09:35:01.0135052Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0040s] [ 56%] 2025-09-07T09:35:01.0135326Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0041s] [ 56%] 2025-09-07T09:35:01.0135651Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0029s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0135979Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0042s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0136317Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0136717Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 56%] 2025-09-07T09:35:01.0137042Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0137366Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0137622Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 57%] 2025-09-07T09:35:01.0137877Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 57%] 2025-09-07T09:35:01.0138130Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 57%] 2025-09-07T09:35:01.0138386Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 57%] 2025-09-07T09:35:01.0138638Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 57%] 2025-09-07T09:35:01.0140043Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 57%] 2025-09-07T09:35:01.0140404Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0140749Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0141073Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0141397Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0141738Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0142077Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0142332Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 57%] 2025-09-07T09:35:01.0142588Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 57%] 2025-09-07T09:35:01.0142838Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 57%] 2025-09-07T09:35:01.0143089Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 57%] 2025-09-07T09:35:01.0143339Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0029s] [ 57%] 2025-09-07T09:35:01.0143591Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 57%] 2025-09-07T09:35:01.0143918Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0144243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0144577Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0144912Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0145232Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0145554Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0145822Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 57%] 2025-09-07T09:35:01.0146088Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 57%] 2025-09-07T09:35:01.0147527Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0036s] [ 57%] 2025-09-07T09:35:01.0147782Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 57%] 2025-09-07T09:35:01.0148032Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 57%] 2025-09-07T09:35:01.0148284Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 57%] 2025-09-07T09:35:01.0148609Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0012s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0148935Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0149258Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0149581Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0149926Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0150264Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0150517Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 57%] 2025-09-07T09:35:01.0150770Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 57%] 2025-09-07T09:35:01.0151036Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 57%] 2025-09-07T09:35:01.0151301Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 57%] 2025-09-07T09:35:01.0151549Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0028s] [ 57%] 2025-09-07T09:35:01.0151800Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 57%] 2025-09-07T09:35:01.0152125Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0152449Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0152770Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0153091Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 57%] 2025-09-07T09:35:01.0154502Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0154826Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0155102Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 58%] 2025-09-07T09:35:01.0155371Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 58%] 2025-09-07T09:35:01.0155624Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 58%] 2025-09-07T09:35:01.0155874Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 58%] 2025-09-07T09:35:01.0156124Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 58%] 2025-09-07T09:35:01.0156387Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 58%] 2025-09-07T09:35:01.0156797Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0012s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0157121Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0157446Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0157769Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0158088Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0158409Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0158661Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 58%] 2025-09-07T09:35:01.0158916Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 58%] 2025-09-07T09:35:01.0159162Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0030s] [ 58%] 2025-09-07T09:35:01.0159437Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 58%] 2025-09-07T09:35:01.0159704Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0031s] [ 58%] 2025-09-07T09:35:01.0159955Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 58%] 2025-09-07T09:35:01.0161378Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0161732Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0012s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0162079Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0162396Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0162716Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0163038Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0163288Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0036s] [ 58%] 2025-09-07T09:35:01.0163543Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 58%] 2025-09-07T09:35:01.0163791Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 58%] 2025-09-07T09:35:01.0164044Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 58%] 2025-09-07T09:35:01.0164290Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 58%] 2025-09-07T09:35:01.0164554Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 58%] 2025-09-07T09:35:01.0164896Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0165223Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0165546Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0165880Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0166211Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0166591Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0166844Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 58%] 2025-09-07T09:35:01.0167099Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 58%] 2025-09-07T09:35:01.0167351Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 58%] 2025-09-07T09:35:01.0168702Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 58%] 2025-09-07T09:35:01.0168952Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 58%] 2025-09-07T09:35:01.0169203Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 58%] 2025-09-07T09:35:01.0169525Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0169875Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0170216Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0170539Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 58%] 2025-09-07T09:35:01.0170862Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 59%] 2025-09-07T09:35:01.0171199Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 59%] 2025-09-07T09:35:01.0171478Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 59%] 2025-09-07T09:35:01.0171735Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 59%] 2025-09-07T09:35:01.0171986Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0077s] [ 59%] 2025-09-07T09:35:01.0172241Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 59%] 2025-09-07T09:35:01.0172494Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0023s] [ 59%] 2025-09-07T09:35:01.0172746Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 59%] 2025-09-07T09:35:01.0173003Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 59%] 2025-09-07T09:35:01.0173260Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 59%] 2025-09-07T09:35:01.0173510Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 59%] 2025-09-07T09:35:01.0173764Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 59%] 2025-09-07T09:35:01.0174026Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0025s] [ 59%] 2025-09-07T09:35:01.0175391Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 59%] 2025-09-07T09:35:01.0175648Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 59%] 2025-09-07T09:35:01.0175901Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 59%] 2025-09-07T09:35:01.0176155Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0030s] [ 59%] 2025-09-07T09:35:01.0176421Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 59%] 2025-09-07T09:35:01.0176731Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 59%] 2025-09-07T09:35:01.0176981Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 59%] 2025-09-07T09:35:01.0177233Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0038s] [ 59%] 2025-09-07T09:35:01.0177489Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 59%] 2025-09-07T09:35:01.0177741Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0037s] [ 59%] 2025-09-07T09:35:01.0177992Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 59%] 2025-09-07T09:35:01.0178239Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0035s] [ 59%] 2025-09-07T09:35:01.0178493Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 59%] 2025-09-07T09:35:01.0178744Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 59%] 2025-09-07T09:35:01.0179042Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0023s] [ 59%] 2025-09-07T09:35:01.0179308Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0034s] [ 59%] 2025-09-07T09:35:01.0179575Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 59%] 2025-09-07T09:35:01.0179822Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0031s] [ 59%] 2025-09-07T09:35:01.0180071Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 59%] 2025-09-07T09:35:01.0180322Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0064s] [ 59%] 2025-09-07T09:35:01.0180592Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 59%] 2025-09-07T09:35:01.0181962Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0030s] [ 59%] 2025-09-07T09:35:01.0182214Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 59%] 2025-09-07T09:35:01.0182463Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0038s] [ 59%] 2025-09-07T09:35:01.0182716Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 59%] 2025-09-07T09:35:01.0182964Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 59%] 2025-09-07T09:35:01.0183216Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 59%] 2025-09-07T09:35:01.0183464Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0035s] [ 59%] 2025-09-07T09:35:01.0183714Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 59%] 2025-09-07T09:35:01.0183964Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0031s] [ 59%] 2025-09-07T09:35:01.0184212Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 59%] 2025-09-07T09:35:01.0184478Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0036s] [ 59%] 2025-09-07T09:35:01.0186823Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 59%] 2025-09-07T09:35:01.0187073Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0036s] [ 59%] 2025-09-07T09:35:01.0187323Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 59%] 2025-09-07T09:35:01.0187573Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0038s] [ 60%] 2025-09-07T09:35:01.0187845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 60%] 2025-09-07T09:35:01.0188113Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 60%] 2025-09-07T09:35:01.0188367Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 60%] 2025-09-07T09:35:01.0188615Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0034s] [ 60%] 2025-09-07T09:35:01.0188864Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 60%] 2025-09-07T09:35:01.0190251Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0031s] [ 60%] 2025-09-07T09:35:01.0190503Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0022s] [ 60%] 2025-09-07T09:35:01.0190755Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 60%] 2025-09-07T09:35:01.0191010Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 60%] 2025-09-07T09:35:01.0191258Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0030s] [ 60%] 2025-09-07T09:35:01.0191510Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 60%] 2025-09-07T09:35:01.0191781Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0038s] [ 60%] 2025-09-07T09:35:01.0192048Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 60%] 2025-09-07T09:35:01.0192298Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 60%] 2025-09-07T09:35:01.0192548Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 60%] 2025-09-07T09:35:01.0192796Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0035s] [ 60%] 2025-09-07T09:35:01.0193063Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 60%] 2025-09-07T09:35:01.0193322Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0034s] [ 60%] 2025-09-07T09:35:01.0193568Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 60%] 2025-09-07T09:35:01.0193819Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 60%] 2025-09-07T09:35:01.0194073Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 60%] 2025-09-07T09:35:01.0194323Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0036s] [ 60%] 2025-09-07T09:35:01.0194570Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 60%] 2025-09-07T09:35:01.0194817Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0038s] [ 60%] 2025-09-07T09:35:01.0195066Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 60%] 2025-09-07T09:35:01.0195318Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 60%] 2025-09-07T09:35:01.0196720Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 60%] 2025-09-07T09:35:01.0196993Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 60%] 2025-09-07T09:35:01.0197264Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 60%] 2025-09-07T09:35:01.0197513Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0031s] [ 60%] 2025-09-07T09:35:01.0197761Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 60%] 2025-09-07T09:35:01.0198012Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0038s] [ 60%] 2025-09-07T09:35:01.0198283Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 60%] 2025-09-07T09:35:01.0198546Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0036s] [ 60%] 2025-09-07T09:35:01.0198797Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 60%] 2025-09-07T09:35:01.0199045Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0032s] [ 60%] 2025-09-07T09:35:01.0199297Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 60%] 2025-09-07T09:35:01.0199549Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 60%] 2025-09-07T09:35:01.0199800Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 60%] 2025-09-07T09:35:01.0200047Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0034s] [ 60%] 2025-09-07T09:35:01.0200297Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 60%] 2025-09-07T09:35:01.0200543Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 60%] 2025-09-07T09:35:01.0200791Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 60%] 2025-09-07T09:35:01.0201052Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0039s] [ 60%] 2025-09-07T09:35:01.0201318Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 60%] 2025-09-07T09:35:01.0201569Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0040s] [ 60%] 2025-09-07T09:35:01.0202910Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 60%] 2025-09-07T09:35:01.0203160Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0038s] [ 61%] 2025-09-07T09:35:01.0203426Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 61%] 2025-09-07T09:35:01.0203691Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0081s] [ 61%] 2025-09-07T09:35:01.0203944Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0023s] [ 61%] 2025-09-07T09:35:01.0204192Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0069s] [ 61%] 2025-09-07T09:35:01.0204443Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 61%] 2025-09-07T09:35:01.0204693Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 61%] 2025-09-07T09:35:01.0204945Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0022s] [ 61%] 2025-09-07T09:35:01.0205197Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 61%] 2025-09-07T09:35:01.0205454Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 61%] 2025-09-07T09:35:01.0205703Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0049s] [ 61%] 2025-09-07T09:35:01.0205956Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 61%] 2025-09-07T09:35:01.0206218Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0035s] [ 61%] 2025-09-07T09:35:01.0206539Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 61%] 2025-09-07T09:35:01.0206791Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0073s] [ 61%] 2025-09-07T09:35:01.0207045Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 61%] 2025-09-07T09:35:01.0207293Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 61%] 2025-09-07T09:35:01.0207572Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 61%] 2025-09-07T09:35:01.0207836Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 61%] 2025-09-07T09:35:01.0208085Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 61%] 2025-09-07T09:35:01.0209431Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 61%] 2025-09-07T09:35:01.0209688Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 61%] 2025-09-07T09:35:01.0209937Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0036s] [ 61%] 2025-09-07T09:35:01.0210187Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 61%] 2025-09-07T09:35:01.0210438Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 61%] 2025-09-07T09:35:01.0210690Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 61%] 2025-09-07T09:35:01.0210939Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 61%] 2025-09-07T09:35:01.0211191Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0023s] [ 61%] 2025-09-07T09:35:01.0211459Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0032s] [ 61%] 2025-09-07T09:35:01.0211735Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 61%] 2025-09-07T09:35:01.0211985Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0022s] [ 61%] 2025-09-07T09:35:01.0212234Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0021s] [ 61%] 2025-09-07T09:35:01.0212485Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 61%] 2025-09-07T09:35:01.0212753Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 61%] 2025-09-07T09:35:01.0213013Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 61%] 2025-09-07T09:35:01.0213262Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 61%] 2025-09-07T09:35:01.0213508Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0029s] [ 61%] 2025-09-07T09:35:01.0213760Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 61%] 2025-09-07T09:35:01.0214012Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0036s] [ 61%] 2025-09-07T09:35:01.0214262Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 61%] 2025-09-07T09:35:01.0215591Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0031s] [ 61%] 2025-09-07T09:35:01.0215845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 61%] 2025-09-07T09:35:01.0216090Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 61%] 2025-09-07T09:35:01.0216338Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 61%] 2025-09-07T09:35:01.0216652Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 61%] 2025-09-07T09:35:01.0216928Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 61%] 2025-09-07T09:35:01.0217194Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 61%] 2025-09-07T09:35:01.0217444Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 61%] 2025-09-07T09:35:01.0217690Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0035s] [ 62%] 2025-09-07T09:35:01.0217941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 62%] 2025-09-07T09:35:01.0218207Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 62%] 2025-09-07T09:35:01.0218476Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0023s] [ 62%] 2025-09-07T09:35:01.0218722Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0032s] [ 62%] 2025-09-07T09:35:01.0219002Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 62%] 2025-09-07T09:35:01.0219251Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 62%] 2025-09-07T09:35:01.0219499Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0022s] [ 62%] 2025-09-07T09:35:01.0219749Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 62%] 2025-09-07T09:35:01.0220002Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 62%] 2025-09-07T09:35:01.0220255Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0026s] [ 62%] 2025-09-07T09:35:01.0220504Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 62%] 2025-09-07T09:35:01.0220750Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0029s] [ 62%] 2025-09-07T09:35:01.0222115Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 62%] 2025-09-07T09:35:01.0222382Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 62%] 2025-09-07T09:35:01.0222635Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 62%] 2025-09-07T09:35:01.0222881Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 62%] 2025-09-07T09:35:01.0223129Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 62%] 2025-09-07T09:35:01.0223387Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0030s] [ 62%] 2025-09-07T09:35:01.0223649Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 62%] 2025-09-07T09:35:01.0223898Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 62%] 2025-09-07T09:35:01.0224147Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 62%] 2025-09-07T09:35:01.0224394Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0027s] [ 62%] 2025-09-07T09:35:01.0224642Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 62%] 2025-09-07T09:35:01.0224886Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0040s] [ 62%] 2025-09-07T09:35:01.0225133Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 62%] 2025-09-07T09:35:01.0225384Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 62%] 2025-09-07T09:35:01.0225638Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0023s] [ 62%] 2025-09-07T09:35:01.0225889Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 62%] 2025-09-07T09:35:01.0226149Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 62%] 2025-09-07T09:35:01.0226411Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 62%] 2025-09-07T09:35:01.0226708Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0022s] [ 62%] 2025-09-07T09:35:01.0226958Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 62%] 2025-09-07T09:35:01.0228302Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 62%] 2025-09-07T09:35:01.0228578Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0026s] [ 62%] 2025-09-07T09:35:01.0228845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 62%] 2025-09-07T09:35:01.0229098Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0025s] [ 62%] 2025-09-07T09:35:01.0229348Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 62%] 2025-09-07T09:35:01.0229598Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 62%] 2025-09-07T09:35:01.0229849Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 62%] 2025-09-07T09:35:01.0230094Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0031s] [ 62%] 2025-09-07T09:35:01.0230341Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 62%] 2025-09-07T09:35:01.0230589Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 62%] 2025-09-07T09:35:01.0230836Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 62%] 2025-09-07T09:35:01.0231085Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0036s] [ 62%] 2025-09-07T09:35:01.0231361Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 62%] 2025-09-07T09:35:01.0231624Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 62%] 2025-09-07T09:35:01.0231873Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 62%] 2025-09-07T09:35:01.0232120Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0026s] [ 63%] 2025-09-07T09:35:01.0232369Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 63%] 2025-09-07T09:35:01.0232638Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 63%] 2025-09-07T09:35:01.0232905Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 63%] 2025-09-07T09:35:01.0233154Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 63%] 2025-09-07T09:35:01.0233407Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 63%] 2025-09-07T09:35:01.0234743Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0023s] [ 63%] 2025-09-07T09:35:01.0234999Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 63%] 2025-09-07T09:35:01.0235251Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 63%] 2025-09-07T09:35:01.0235507Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 63%] 2025-09-07T09:35:01.0235759Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0028s] [ 63%] 2025-09-07T09:35:01.0236011Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 63%] 2025-09-07T09:35:01.0236261Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 63%] 2025-09-07T09:35:01.0236615Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 63%] 2025-09-07T09:35:01.0236886Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 63%] 2025-09-07T09:35:01.0237141Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 63%] 2025-09-07T09:35:01.0237387Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 63%] 2025-09-07T09:35:01.0237637Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 63%] 2025-09-07T09:35:01.0237904Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 63%] 2025-09-07T09:35:01.0238170Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 63%] 2025-09-07T09:35:01.0238422Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 63%] 2025-09-07T09:35:01.0238677Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 63%] 2025-09-07T09:35:01.0238931Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0036s] [ 63%] 2025-09-07T09:35:01.0239184Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0040s] [ 63%] 2025-09-07T09:35:01.0239433Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0034s] [ 63%] 2025-09-07T09:35:01.0239684Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 63%] 2025-09-07T09:35:01.0241028Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 63%] 2025-09-07T09:35:01.0241283Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 63%] 2025-09-07T09:35:01.0241532Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0024s] [ 63%] 2025-09-07T09:35:01.0241802Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 63%] 2025-09-07T09:35:01.0242064Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0028s] [ 63%] 2025-09-07T09:35:01.0242318Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 63%] 2025-09-07T09:35:01.0242570Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 63%] 2025-09-07T09:35:01.0242825Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 63%] 2025-09-07T09:35:01.0243085Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0038s] [ 63%] 2025-09-07T09:35:01.0243348Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 63%] 2025-09-07T09:35:01.0243596Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 63%] 2025-09-07T09:35:01.0243848Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 63%] 2025-09-07T09:35:01.0244098Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 63%] 2025-09-07T09:35:01.0244351Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 63%] 2025-09-07T09:35:01.0244600Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0038s] [ 63%] 2025-09-07T09:35:01.0244850Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 63%] 2025-09-07T09:35:01.0245098Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 63%] 2025-09-07T09:35:01.0245348Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 63%] 2025-09-07T09:35:01.0245597Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0037s] [ 63%] 2025-09-07T09:35:01.0245861Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 63%] 2025-09-07T09:35:01.0246122Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 63%] 2025-09-07T09:35:01.0247528Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 63%] 2025-09-07T09:35:01.0247777Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0033s] [ 64%] 2025-09-07T09:35:01.0248030Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 64%] 2025-09-07T09:35:01.0248314Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 64%] 2025-09-07T09:35:01.0248590Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 64%] 2025-09-07T09:35:01.0248838Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 64%] 2025-09-07T09:35:01.0249089Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 64%] 2025-09-07T09:35:01.0249337Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0028s] [ 64%] 2025-09-07T09:35:01.0249585Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 64%] 2025-09-07T09:35:01.0249836Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 64%] 2025-09-07T09:35:01.0250090Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 64%] 2025-09-07T09:35:01.0250338Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0032s] [ 64%] 2025-09-07T09:35:01.0250587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 64%] 2025-09-07T09:35:01.0250834Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 64%] 2025-09-07T09:35:01.0251102Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 64%] 2025-09-07T09:35:01.0251369Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 64%] 2025-09-07T09:35:01.0251618Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 64%] 2025-09-07T09:35:01.0251863Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0031s] [ 64%] 2025-09-07T09:35:01.0252111Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 64%] 2025-09-07T09:35:01.0252370Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0030s] [ 64%] 2025-09-07T09:35:01.0253726Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 64%] 2025-09-07T09:35:01.0253977Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 64%] 2025-09-07T09:35:01.0254230Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 64%] 2025-09-07T09:35:01.0254477Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0044s] [ 64%] 2025-09-07T09:35:01.0254726Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0045s] [ 64%] 2025-09-07T09:35:01.0254972Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0034s] [ 64%] 2025-09-07T09:35:01.0255220Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0030s] [ 64%] 2025-09-07T09:35:01.0255474Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 64%] 2025-09-07T09:35:01.0255728Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 64%] 2025-09-07T09:35:01.0255974Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0039s] [ 64%] 2025-09-07T09:35:01.0256242Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 64%] 2025-09-07T09:35:01.0256566Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 64%] 2025-09-07T09:35:01.0256816Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 64%] 2025-09-07T09:35:01.0257066Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 64%] 2025-09-07T09:35:01.0257321Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 64%] 2025-09-07T09:35:01.0257597Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0028s] [ 64%] 2025-09-07T09:35:01.0257876Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 64%] 2025-09-07T09:35:01.0258122Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0032s] [ 64%] 2025-09-07T09:35:01.0258374Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 64%] 2025-09-07T09:35:01.0258624Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 64%] 2025-09-07T09:35:01.0258876Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 64%] 2025-09-07T09:35:01.0260273Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 64%] 2025-09-07T09:35:01.0260526Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 64%] 2025-09-07T09:35:01.0260775Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 64%] 2025-09-07T09:35:01.0261029Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 64%] 2025-09-07T09:35:01.0261283Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 64%] 2025-09-07T09:35:01.0261561Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 64%] 2025-09-07T09:35:01.0261824Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0038s] [ 64%] 2025-09-07T09:35:01.0262072Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 64%] 2025-09-07T09:35:01.0262317Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0032s] [ 65%] 2025-09-07T09:35:01.0262567Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 65%] 2025-09-07T09:35:01.0262829Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 65%] 2025-09-07T09:35:01.0263092Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 65%] 2025-09-07T09:35:01.0263339Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 65%] 2025-09-07T09:35:01.0263587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 65%] 2025-09-07T09:35:01.0263835Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0028s] [ 65%] 2025-09-07T09:35:01.0264083Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 65%] 2025-09-07T09:35:01.0264332Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0038s] [ 65%] 2025-09-07T09:35:01.0264584Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 65%] 2025-09-07T09:35:01.0264833Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 65%] 2025-09-07T09:35:01.0265081Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 65%] 2025-09-07T09:35:01.0266405Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 65%] 2025-09-07T09:35:01.0266771Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 65%] 2025-09-07T09:35:01.0267038Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 65%] 2025-09-07T09:35:01.0267288Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 65%] 2025-09-07T09:35:01.0267532Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 65%] 2025-09-07T09:35:01.0267781Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 65%] 2025-09-07T09:35:01.0268063Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0030s] [ 65%] 2025-09-07T09:35:01.0268327Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 65%] 2025-09-07T09:35:01.0268574Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 65%] 2025-09-07T09:35:01.0268825Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 65%] 2025-09-07T09:35:01.0269073Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0032s] [ 65%] 2025-09-07T09:35:01.0269320Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 65%] 2025-09-07T09:35:01.0269563Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 65%] 2025-09-07T09:35:01.0269810Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 65%] 2025-09-07T09:35:01.0270059Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0024s] [ 65%] 2025-09-07T09:35:01.0270309Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0023s] [ 65%] 2025-09-07T09:35:01.0270553Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0032s] [ 65%] 2025-09-07T09:35:01.0270811Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 65%] 2025-09-07T09:35:01.0271070Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 65%] 2025-09-07T09:35:01.0271319Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 65%] 2025-09-07T09:35:01.0271565Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 65%] 2025-09-07T09:35:01.0272913Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 65%] 2025-09-07T09:35:01.0273177Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0027s] [ 65%] 2025-09-07T09:35:01.0273437Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 65%] 2025-09-07T09:35:01.0273680Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0033s] [ 65%] 2025-09-07T09:35:01.0273926Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 65%] 2025-09-07T09:35:01.0274175Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 65%] 2025-09-07T09:35:01.0274427Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 65%] 2025-09-07T09:35:01.0274668Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 65%] 2025-09-07T09:35:01.0274914Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 65%] 2025-09-07T09:35:01.0275158Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0028s] [ 65%] 2025-09-07T09:35:01.0275404Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 65%] 2025-09-07T09:35:01.0275651Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 65%] 2025-09-07T09:35:01.0275898Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 65%] 2025-09-07T09:35:01.0276160Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0036s] [ 65%] 2025-09-07T09:35:01.0276423Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 65%] 2025-09-07T09:35:01.0276751Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0034s] [ 66%] 2025-09-07T09:35:01.0276996Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 66%] 2025-09-07T09:35:01.0277243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0023s] [ 66%] 2025-09-07T09:35:01.0277516Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0023s] [ 66%] 2025-09-07T09:35:01.0277776Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0034s] [ 66%] 2025-09-07T09:35:01.0279119Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 66%] 2025-09-07T09:35:01.0279364Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0031s] [ 66%] 2025-09-07T09:35:01.0279611Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 66%] 2025-09-07T09:35:01.0279860Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 66%] 2025-09-07T09:35:01.0280109Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 66%] 2025-09-07T09:35:01.0280353Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0026s] [ 66%] 2025-09-07T09:35:01.0280602Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 66%] 2025-09-07T09:35:01.0280845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 66%] 2025-09-07T09:35:01.0281091Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 66%] 2025-09-07T09:35:01.0281358Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 66%] 2025-09-07T09:35:01.0281624Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 66%] 2025-09-07T09:35:01.0281869Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 66%] 2025-09-07T09:35:01.0282114Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 66%] 2025-09-07T09:35:01.0282358Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 66%] 2025-09-07T09:35:01.0282615Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 66%] 2025-09-07T09:35:01.0282874Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 66%] 2025-09-07T09:35:01.0283121Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 66%] 2025-09-07T09:35:01.0283366Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0032s] [ 66%] 2025-09-07T09:35:01.0283612Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 66%] 2025-09-07T09:35:01.0283855Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0034s] [ 66%] 2025-09-07T09:35:01.0284101Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 66%] 2025-09-07T09:35:01.0285432Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 66%] 2025-09-07T09:35:01.0285686Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0023s] [ 66%] 2025-09-07T09:35:01.0285929Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0034s] [ 66%] 2025-09-07T09:35:01.0286177Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 66%] 2025-09-07T09:35:01.0286439Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0034s] [ 66%] 2025-09-07T09:35:01.0286787Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 66%] 2025-09-07T09:35:01.0287035Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 66%] 2025-09-07T09:35:01.0287284Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 66%] 2025-09-07T09:35:01.0287532Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0034s] [ 66%] 2025-09-07T09:35:01.0287804Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 66%] 2025-09-07T09:35:01.0288063Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0035s] [ 66%] 2025-09-07T09:35:01.0288308Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 66%] 2025-09-07T09:35:01.0288556Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0024s] [ 66%] 2025-09-07T09:35:01.0288806Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 66%] 2025-09-07T09:35:01.0289048Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 66%] 2025-09-07T09:35:01.0289292Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 66%] 2025-09-07T09:35:01.0289533Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0033s] [ 66%] 2025-09-07T09:35:01.0289782Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 66%] 2025-09-07T09:35:01.0290028Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 66%] 2025-09-07T09:35:01.0290277Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 66%] 2025-09-07T09:35:01.0290536Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0032s] [ 66%] 2025-09-07T09:35:01.0291901Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 66%] 2025-09-07T09:35:01.0292147Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0040s] [ 67%] 2025-09-07T09:35:01.0292391Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 67%] 2025-09-07T09:35:01.0292649Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0584s] [ 67%] 2025-09-07T09:35:01.0292923Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0103s] [ 67%] 2025-09-07T09:35:01.0293193Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0093s] [ 67%] 2025-09-07T09:35:01.0293449Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0093s] [ 67%] 2025-09-07T09:35:01.0293703Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0105s] [ 67%] 2025-09-07T09:35:01.0293959Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0111s] [ 67%] 2025-09-07T09:35:01.0294217Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0116s] [ 67%] 2025-09-07T09:35:01.0294479Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0126s] [ 67%] 2025-09-07T09:35:01.0294732Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0125s] [ 67%] 2025-09-07T09:35:01.0294989Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0126s] [ 67%] 2025-09-07T09:35:01.0295244Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0138s] [ 67%] 2025-09-07T09:35:01.0295501Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0144s] [ 67%] 2025-09-07T09:35:01.0295769Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0118s] [ 67%] 2025-09-07T09:35:01.0296044Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0098s] [ 67%] 2025-09-07T09:35:01.0296296Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0096s] [ 67%] 2025-09-07T09:35:01.0296629Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0096s] [ 67%] 2025-09-07T09:35:01.0296882Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0103s] [ 67%] 2025-09-07T09:35:01.0298255Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0111s] [ 67%] 2025-09-07T09:35:01.0298530Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0113s] [ 67%] 2025-09-07T09:35:01.0298789Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0123s] [ 67%] 2025-09-07T09:35:01.0299087Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0124s] [ 67%] 2025-09-07T09:35:01.0299343Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0123s] [ 67%] 2025-09-07T09:35:01.0299596Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0141s] [ 67%] 2025-09-07T09:35:01.0299851Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0132s] [ 67%] 2025-09-07T09:35:01.0300105Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0039s] [ 67%] 2025-09-07T09:35:01.0300364Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0039s] [ 67%] 2025-09-07T09:35:01.0300615Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0040s] [ 67%] 2025-09-07T09:35:01.0300869Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0040s] [ 67%] 2025-09-07T09:35:01.0301137Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0039s] [ 67%] 2025-09-07T09:35:01.0301407Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0039s] [ 67%] 2025-09-07T09:35:01.0301662Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0053s] [ 67%] 2025-09-07T09:35:01.0301919Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0053s] [ 67%] 2025-09-07T09:35:01.0302170Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0052s] [ 67%] 2025-09-07T09:35:01.0302435Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0052s] [ 67%] 2025-09-07T09:35:01.0302698Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0054s] [ 67%] 2025-09-07T09:35:01.0302951Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0053s] [ 67%] 2025-09-07T09:35:01.0303209Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0043s] [ 67%] 2025-09-07T09:35:01.0303465Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0043s] [ 67%] 2025-09-07T09:35:01.0304801Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0043s] [ 67%] 2025-09-07T09:35:01.0305054Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0042s] [ 67%] 2025-09-07T09:35:01.0305304Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0041s] [ 67%] 2025-09-07T09:35:01.0305558Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0041s] [ 67%] 2025-09-07T09:35:01.0305809Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0055s] [ 67%] 2025-09-07T09:35:01.0306064Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0055s] [ 67%] 2025-09-07T09:35:01.0306341Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0056s] [ 67%] 2025-09-07T09:35:01.0306681Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0056s] [ 67%] 2025-09-07T09:35:01.0306931Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0055s] [ 68%] 2025-09-07T09:35:01.0307183Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0055s] [ 68%] 2025-09-07T09:35:01.0307438Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0039s] [ 68%] 2025-09-07T09:35:01.0307711Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0039s] [ 68%] 2025-09-07T09:35:01.0307979Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0040s] [ 68%] 2025-09-07T09:35:01.0308230Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0040s] [ 68%] 2025-09-07T09:35:01.0308485Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0039s] [ 68%] 2025-09-07T09:35:01.0308740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0039s] [ 68%] 2025-09-07T09:35:01.0308993Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0053s] [ 68%] 2025-09-07T09:35:01.0309248Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0049s] [ 68%] 2025-09-07T09:35:01.0309500Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0050s] [ 68%] 2025-09-07T09:35:01.0309754Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0049s] [ 68%] 2025-09-07T09:35:01.0311111Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0049s] [ 68%] 2025-09-07T09:35:01.0311366Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0050s] [ 68%] 2025-09-07T09:35:01.0311646Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0040s] [ 68%] 2025-09-07T09:35:01.0311921Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0041s] [ 68%] 2025-09-07T09:35:01.0312171Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0039s] [ 68%] 2025-09-07T09:35:01.0312422Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0039s] [ 68%] 2025-09-07T09:35:01.0312671Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0039s] [ 68%] 2025-09-07T09:35:01.0312935Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0039s] [ 68%] 2025-09-07T09:35:01.0313199Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0053s] [ 68%] 2025-09-07T09:35:01.0313453Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0053s] [ 68%] 2025-09-07T09:35:01.0313706Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0051s] [ 68%] 2025-09-07T09:35:01.0313961Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0052s] [ 68%] 2025-09-07T09:35:01.0314212Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0051s] [ 68%] 2025-09-07T09:35:01.0314463Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0051s] [ 68%] 2025-09-07T09:35:01.0314718Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0996s] [ 68%] 2025-09-07T09:35:01.0314977Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0087s] [ 68%] 2025-09-07T09:35:01.0315228Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0086s] [ 68%] 2025-09-07T09:35:01.0315481Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0086s] [ 68%] 2025-09-07T09:35:01.0315745Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0095s] [ 68%] 2025-09-07T09:35:01.0316011Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0102s] [ 68%] 2025-09-07T09:35:01.0316267Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0106s] [ 68%] 2025-09-07T09:35:01.0317718Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0115s] [ 68%] 2025-09-07T09:35:01.0318003Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0114s] [ 68%] 2025-09-07T09:35:01.0318273Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0114s] [ 68%] 2025-09-07T09:35:01.0318523Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0124s] [ 68%] 2025-09-07T09:35:01.0318776Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0132s] [ 68%] 2025-09-07T09:35:01.0319032Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0110s] [ 68%] 2025-09-07T09:35:01.0319287Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0090s] [ 68%] 2025-09-07T09:35:01.0319536Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0097s] [ 68%] 2025-09-07T09:35:01.0319790Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0087s] [ 68%] 2025-09-07T09:35:01.0320041Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0061s] [ 68%] 2025-09-07T09:35:01.0320293Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0063s] [ 68%] 2025-09-07T09:35:01.0320546Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0108s] [ 68%] 2025-09-07T09:35:01.0320819Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0116s] [ 68%] 2025-09-07T09:35:01.0321093Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0116s] [ 68%] 2025-09-07T09:35:01.0321346Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0117s] [ 68%] 2025-09-07T09:35:01.0321595Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0122s] [ 69%] 2025-09-07T09:35:01.0321851Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0126s] [ 69%] 2025-09-07T09:35:01.0322119Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0042s] [ 69%] 2025-09-07T09:35:01.0322396Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0035s] [ 69%] 2025-09-07T09:35:01.0322649Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0032s] [ 69%] 2025-09-07T09:35:01.0323999Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 69%] 2025-09-07T09:35:01.0324259Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 69%] 2025-09-07T09:35:01.0324514Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0032s] [ 69%] 2025-09-07T09:35:01.0324771Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0036s] [ 69%] 2025-09-07T09:35:01.0325030Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0036s] [ 69%] 2025-09-07T09:35:01.0325287Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0036s] [ 69%] 2025-09-07T09:35:01.0325544Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0036s] [ 69%] 2025-09-07T09:35:01.0325796Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0037s] [ 69%] 2025-09-07T09:35:01.0326069Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0037s] [ 69%] 2025-09-07T09:35:01.0326336Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 69%] 2025-09-07T09:35:01.0326663Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0033s] [ 69%] 2025-09-07T09:35:01.0326914Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0033s] [ 69%] 2025-09-07T09:35:01.0327168Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0033s] [ 69%] 2025-09-07T09:35:01.0327442Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 69%] 2025-09-07T09:35:01.0327712Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0032s] [ 69%] 2025-09-07T09:35:01.0327967Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0038s] [ 69%] 2025-09-07T09:35:01.0328225Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0037s] [ 69%] 2025-09-07T09:35:01.0328480Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0037s] [ 69%] 2025-09-07T09:35:01.0328736Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0037s] [ 69%] 2025-09-07T09:35:01.0328987Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0038s] [ 69%] 2025-09-07T09:35:01.0329241Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0032s] [ 69%] 2025-09-07T09:35:01.0330600Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 69%] 2025-09-07T09:35:01.0330861Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 69%] 2025-09-07T09:35:01.0331112Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 69%] 2025-09-07T09:35:01.0331396Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 69%] 2025-09-07T09:35:01.0331666Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 69%] 2025-09-07T09:35:01.0331920Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 69%] 2025-09-07T09:35:01.0332173Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 69%] 2025-09-07T09:35:01.0332431Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 69%] 2025-09-07T09:35:01.0332696Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0026s] [ 69%] 2025-09-07T09:35:01.0332965Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 69%] 2025-09-07T09:35:01.0333216Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0024s] [ 69%] 2025-09-07T09:35:01.0333472Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 69%] 2025-09-07T09:35:01.0333727Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 69%] 2025-09-07T09:35:01.0333981Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 69%] 2025-09-07T09:35:01.0334231Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0024s] [ 69%] 2025-09-07T09:35:01.0334484Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 69%] 2025-09-07T09:35:01.0334736Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0022s] [ 69%] 2025-09-07T09:35:01.0334989Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0022s] [ 69%] 2025-09-07T09:35:01.0335244Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 69%] 2025-09-07T09:35:01.0335512Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 69%] 2025-09-07T09:35:01.0336935Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0027s] [ 69%] 2025-09-07T09:35:01.0337193Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 69%] 2025-09-07T09:35:01.0337442Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0025s] [ 70%] 2025-09-07T09:35:01.0337696Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 70%] 2025-09-07T09:35:01.0337992Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 70%] 2025-09-07T09:35:01.0338265Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0023s] [ 70%] 2025-09-07T09:35:01.0338521Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0022s] [ 70%] 2025-09-07T09:35:01.0338778Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0022s] [ 70%] 2025-09-07T09:35:01.0339078Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0023s] [ 70%] 2025-09-07T09:35:01.0339332Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0022s] [ 70%] 2025-09-07T09:35:01.0339585Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 70%] 2025-09-07T09:35:01.0339841Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 70%] 2025-09-07T09:35:01.0340093Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0025s] [ 70%] 2025-09-07T09:35:01.0340347Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 70%] 2025-09-07T09:35:01.0340599Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0025s] [ 70%] 2025-09-07T09:35:01.0340874Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 70%] 2025-09-07T09:35:01.0341145Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0024s] [ 70%] 2025-09-07T09:35:01.0341401Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 70%] 2025-09-07T09:35:01.0341649Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0023s] [ 70%] 2025-09-07T09:35:01.0341902Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 70%] 2025-09-07T09:35:01.0342171Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0022s] [ 70%] 2025-09-07T09:35:01.0343535Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0022s] [ 70%] 2025-09-07T09:35:01.0343790Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 70%] 2025-09-07T09:35:01.0344048Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 70%] 2025-09-07T09:35:01.0344300Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0026s] [ 70%] 2025-09-07T09:35:01.0344552Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 70%] 2025-09-07T09:35:01.0344801Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0025s] [ 70%] 2025-09-07T09:35:01.0345054Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 70%] 2025-09-07T09:35:01.0345309Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0024s] [ 70%] 2025-09-07T09:35:01.0345565Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0023s] [ 70%] 2025-09-07T09:35:01.0345817Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0023s] [ 70%] 2025-09-07T09:35:01.0346088Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 70%] 2025-09-07T09:35:01.0346355Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 70%] 2025-09-07T09:35:01.0346681Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 70%] 2025-09-07T09:35:01.0346936Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 70%] 2025-09-07T09:35:01.0347194Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 70%] 2025-09-07T09:35:01.0347479Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0027s] [ 70%] 2025-09-07T09:35:01.0347751Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 70%] 2025-09-07T09:35:01.0348002Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 70%] 2025-09-07T09:35:01.0348258Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 70%] 2025-09-07T09:35:01.0348517Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 70%] 2025-09-07T09:35:01.0349864Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 70%] 2025-09-07T09:35:01.0350117Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0024s] [ 70%] 2025-09-07T09:35:01.0350371Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 70%] 2025-09-07T09:35:01.0350623Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 70%] 2025-09-07T09:35:01.0350877Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 70%] 2025-09-07T09:35:01.0351130Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 70%] 2025-09-07T09:35:01.0351408Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0032s] [ 70%] 2025-09-07T09:35:01.0351679Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0032s] [ 70%] 2025-09-07T09:35:01.0351934Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 70%] 2025-09-07T09:35:01.0352184Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0032s] [ 71%] 2025-09-07T09:35:01.0352438Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 71%] 2025-09-07T09:35:01.0352708Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.1411s] [ 71%] 2025-09-07T09:35:01.0352978Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0126s] [ 71%] 2025-09-07T09:35:01.0353230Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0128s] [ 71%] 2025-09-07T09:35:01.0353486Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0126s] [ 71%] 2025-09-07T09:35:01.0353740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0563s] [ 71%] 2025-09-07T09:35:01.0353996Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.1187s] [ 71%] 2025-09-07T09:35:01.0354254Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.1120s] [ 71%] 2025-09-07T09:35:01.0354517Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.1347s] [ 71%] 2025-09-07T09:35:01.0354773Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.1150s] [ 71%] 2025-09-07T09:35:01.0355029Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.1317s] [ 71%] 2025-09-07T09:35:01.0356358Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.1276s] [ 71%] 2025-09-07T09:35:01.0356718Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.1138s] [ 71%] 2025-09-07T09:35:01.0356990Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.1261s] [ 71%] 2025-09-07T09:35:01.0357247Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.1067s] [ 71%] 2025-09-07T09:35:01.0357502Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.1139s] [ 71%] 2025-09-07T09:35:01.0357775Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.1118s] [ 71%] 2025-09-07T09:35:01.0358042Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.1188s] [ 71%] 2025-09-07T09:35:01.0358296Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0594s] [ 71%] 2025-09-07T09:35:01.0358551Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0156s] [ 71%] 2025-09-07T09:35:01.0358810Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0304s] [ 71%] 2025-09-07T09:35:01.0359062Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.1003s] [ 71%] 2025-09-07T09:35:01.0359318Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.1015s] [ 71%] 2025-09-07T09:35:01.0359571Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.1121s] [ 71%] 2025-09-07T09:35:01.0359827Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0738s] [ 71%] 2025-09-07T09:35:01.0360083Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0061s] [ 71%] 2025-09-07T09:35:01.0360338Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0067s] [ 71%] 2025-09-07T09:35:01.0360602Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0526s] [ 71%] 2025-09-07T09:35:01.0360878Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0610s] [ 71%] 2025-09-07T09:35:01.0361129Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0566s] [ 71%] 2025-09-07T09:35:01.0361381Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0616s] [ 71%] 2025-09-07T09:35:01.0362734Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0646s] [ 71%] 2025-09-07T09:35:01.0363011Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0908s] [ 71%] 2025-09-07T09:35:01.0363274Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0825s] [ 71%] 2025-09-07T09:35:01.0363527Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0746s] [ 71%] 2025-09-07T09:35:01.0363777Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0890s] [ 71%] 2025-09-07T09:35:01.0364032Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0808s] [ 71%] 2025-09-07T09:35:01.0364286Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0718s] [ 71%] 2025-09-07T09:35:01.0364540Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0632s] [ 71%] 2025-09-07T09:35:01.0364792Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0201s] [ 71%] 2025-09-07T09:35:01.0365049Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0185s] [ 71%] 2025-09-07T09:35:01.0365300Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0252s] [ 71%] 2025-09-07T09:35:01.0365551Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0134s] [ 71%] 2025-09-07T09:35:01.0365816Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0234s] [ 71%] 2025-09-07T09:35:01.0366084Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0674s] [ 71%] 2025-09-07T09:35:01.0366335Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0295s] [ 71%] 2025-09-07T09:35:01.0366655Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0271s] [ 71%] 2025-09-07T09:35:01.0366906Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0219s] [ 72%] 2025-09-07T09:35:01.0367183Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0334s] [ 72%] 2025-09-07T09:35:01.0367453Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0059s] [ 72%] 2025-09-07T09:35:01.0367708Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0058s] [ 72%] 2025-09-07T09:35:01.0367959Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0059s] [ 72%] 2025-09-07T09:35:01.0369309Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0059s] [ 72%] 2025-09-07T09:35:01.0369562Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0059s] [ 72%] 2025-09-07T09:35:01.0369815Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0060s] [ 72%] 2025-09-07T09:35:01.0370069Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0083s] [ 72%] 2025-09-07T09:35:01.0370326Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0082s] [ 72%] 2025-09-07T09:35:01.0370578Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0082s] [ 72%] 2025-09-07T09:35:01.0370831Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0114s] [ 72%] 2025-09-07T09:35:01.0371107Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0091s] [ 72%] 2025-09-07T09:35:01.0371376Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0102s] [ 72%] 2025-09-07T09:35:01.0371631Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0074s] [ 72%] 2025-09-07T09:35:01.0371884Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0092s] [ 72%] 2025-09-07T09:35:01.0372133Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0069s] [ 72%] 2025-09-07T09:35:01.0372398Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0066s] [ 72%] 2025-09-07T09:35:01.0372657Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0065s] [ 72%] 2025-09-07T09:35:01.0372910Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0065s] [ 72%] 2025-09-07T09:35:01.0373162Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0085s] [ 72%] 2025-09-07T09:35:01.0373417Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0085s] [ 72%] 2025-09-07T09:35:01.0373665Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0084s] [ 72%] 2025-09-07T09:35:01.0373916Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0092s] [ 72%] 2025-09-07T09:35:01.0374165Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0628s] [ 72%] 2025-09-07T09:35:01.0375495Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0875s] [ 72%] 2025-09-07T09:35:01.0375752Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [1.8053s] [ 72%] 2025-09-07T09:35:01.0376010Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.1166s] [ 72%] 2025-09-07T09:35:01.0376284Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.1187s] [ 72%] 2025-09-07T09:35:01.0376618Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.1212s] [ 72%] 2025-09-07T09:35:01.0376873Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.1153s] [ 72%] 2025-09-07T09:35:01.0377125Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.1291s] [ 72%] 2025-09-07T09:35:01.0377379Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.1233s] [ 72%] 2025-09-07T09:35:01.0377657Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.1352s] [ 72%] 2025-09-07T09:35:01.0377923Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.1123s] [ 72%] 2025-09-07T09:35:01.0378175Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.1346s] [ 72%] 2025-09-07T09:35:01.0378432Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.1188s] [ 72%] 2025-09-07T09:35:01.0378686Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.1557s] [ 72%] 2025-09-07T09:35:01.0378940Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.1412s] [ 72%] 2025-09-07T09:35:01.0379266Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.1220s] [ 72%] 2025-09-07T09:35:01.0379516Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.1143s] [ 72%] 2025-09-07T09:35:01.0379770Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.1259s] [ 72%] 2025-09-07T09:35:01.0380021Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0095s] [ 72%] 2025-09-07T09:35:01.0380273Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0093s] [ 72%] 2025-09-07T09:35:01.0380549Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.1137s] [ 72%] 2025-09-07T09:35:01.0380822Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.1324s] [ 72%] 2025-09-07T09:35:01.0382183Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.1068s] [ 72%] 2025-09-07T09:35:01.0382436Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.1234s] [ 72%] 2025-09-07T09:35:01.0382688Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.1277s] [ 73%] 2025-09-07T09:35:01.0382960Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.1439s] [ 73%] 2025-09-07T09:35:01.0383226Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0039s] [ 73%] 2025-09-07T09:35:01.0383481Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0035s] [ 73%] 2025-09-07T09:35:01.0383732Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0046s] [ 73%] 2025-09-07T09:35:01.0383990Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0034s] [ 73%] 2025-09-07T09:35:01.0384241Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0034s] [ 73%] 2025-09-07T09:35:01.0384494Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0033s] [ 73%] 2025-09-07T09:35:01.0384747Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0038s] [ 73%] 2025-09-07T09:35:01.0385004Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0038s] [ 73%] 2025-09-07T09:35:01.0385256Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0038s] [ 73%] 2025-09-07T09:35:01.0385509Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0037s] [ 73%] 2025-09-07T09:35:01.0385773Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0037s] [ 73%] 2025-09-07T09:35:01.0386044Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0037s] [ 73%] 2025-09-07T09:35:01.0386297Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 73%] 2025-09-07T09:35:01.0386611Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0035s] [ 73%] 2025-09-07T09:35:01.0386860Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0035s] [ 73%] 2025-09-07T09:35:01.0387129Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0035s] [ 73%] 2025-09-07T09:35:01.0388509Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0034s] [ 73%] 2025-09-07T09:35:01.0388762Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0034s] [ 73%] 2025-09-07T09:35:01.0389015Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0039s] [ 73%] 2025-09-07T09:35:01.0389272Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0040s] [ 73%] 2025-09-07T09:35:01.0389526Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0040s] [ 73%] 2025-09-07T09:35:01.0389777Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0038s] [ 73%] 2025-09-07T09:35:01.0390025Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0039s] [ 73%] 2025-09-07T09:35:01.0390278Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0033s] [ 73%] 2025-09-07T09:35:01.0390530Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 73%] 2025-09-07T09:35:01.0390783Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 73%] 2025-09-07T09:35:01.0391055Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0030s] [ 73%] 2025-09-07T09:35:01.0391321Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 73%] 2025-09-07T09:35:01.0391572Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0028s] [ 73%] 2025-09-07T09:35:01.0391823Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 73%] 2025-09-07T09:35:01.0392076Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 73%] 2025-09-07T09:35:01.0392342Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 73%] 2025-09-07T09:35:01.0392605Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 73%] 2025-09-07T09:35:01.0392857Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 73%] 2025-09-07T09:35:01.0393106Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 73%] 2025-09-07T09:35:01.0393359Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 73%] 2025-09-07T09:35:01.0393608Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 73%] 2025-09-07T09:35:01.0394941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 73%] 2025-09-07T09:35:01.0395191Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 73%] 2025-09-07T09:35:01.0395441Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 73%] 2025-09-07T09:35:01.0395691Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 73%] 2025-09-07T09:35:01.0395940Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 73%] 2025-09-07T09:35:01.0396214Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 73%] 2025-09-07T09:35:01.0396546Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 73%] 2025-09-07T09:35:01.0396794Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0032s] [ 73%] 2025-09-07T09:35:01.0397044Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 73%] 2025-09-07T09:35:01.0397298Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0029s] [ 74%] 2025-09-07T09:35:01.0397576Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 74%] 2025-09-07T09:35:01.0397850Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 74%] 2025-09-07T09:35:01.0398103Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 74%] 2025-09-07T09:35:01.0398355Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 74%] 2025-09-07T09:35:01.0398606Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 74%] 2025-09-07T09:35:01.0398854Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 74%] 2025-09-07T09:35:01.0399103Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 74%] 2025-09-07T09:35:01.0399357Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 74%] 2025-09-07T09:35:01.0399613Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 74%] 2025-09-07T09:35:01.0399861Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 74%] 2025-09-07T09:35:01.0401954Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 74%] 2025-09-07T09:35:01.0402229Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 74%] 2025-09-07T09:35:01.0402509Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 74%] 2025-09-07T09:35:01.0402761Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 74%] 2025-09-07T09:35:01.0403012Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 74%] 2025-09-07T09:35:01.0403260Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 74%] 2025-09-07T09:35:01.0403525Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 74%] 2025-09-07T09:35:01.0403786Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 74%] 2025-09-07T09:35:01.0404036Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 74%] 2025-09-07T09:35:01.0404286Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 74%] 2025-09-07T09:35:01.0404541Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0033s] [ 74%] 2025-09-07T09:35:01.0404788Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0033s] [ 74%] 2025-09-07T09:35:01.0405038Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 74%] 2025-09-07T09:35:01.0405285Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 74%] 2025-09-07T09:35:01.0405536Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0030s] [ 74%] 2025-09-07T09:35:01.0405790Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 74%] 2025-09-07T09:35:01.0406043Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 74%] 2025-09-07T09:35:01.0406303Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 74%] 2025-09-07T09:35:01.0406639Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 74%] 2025-09-07T09:35:01.0408234Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 74%] 2025-09-07T09:35:01.0408487Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 74%] 2025-09-07T09:35:01.0408744Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 74%] 2025-09-07T09:35:01.0409026Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 74%] 2025-09-07T09:35:01.0409296Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0028s] [ 74%] 2025-09-07T09:35:01.0409546Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 74%] 2025-09-07T09:35:01.0409795Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 74%] 2025-09-07T09:35:01.0410048Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 74%] 2025-09-07T09:35:01.0410298Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 74%] 2025-09-07T09:35:01.0410550Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 74%] 2025-09-07T09:35:01.0410800Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 74%] 2025-09-07T09:35:01.0411052Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 74%] 2025-09-07T09:35:01.0411301Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 74%] 2025-09-07T09:35:01.0411553Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0032s] [ 74%] 2025-09-07T09:35:01.0411830Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 74%] 2025-09-07T09:35:01.0412100Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0035s] [ 74%] 2025-09-07T09:35:01.0412348Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0035s] [ 74%] 2025-09-07T09:35:01.0412596Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0034s] [ 75%] 2025-09-07T09:35:01.0412842Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0035s] [ 75%] 2025-09-07T09:35:01.0413103Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_1024_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 75%] 2025-09-07T09:35:01.0413373Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 75%] 2025-09-07T09:35:01.0414810Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 75%] 2025-09-07T09:35:01.0415068Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0030s] [ 75%] 2025-09-07T09:35:01.0415326Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 75%] 2025-09-07T09:35:01.0415582Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0029s] [ 75%] 2025-09-07T09:35:01.0415836Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 75%] 2025-09-07T09:35:01.0416091Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 75%] 2025-09-07T09:35:01.0416351Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0034s] [ 75%] 2025-09-07T09:35:01.0416673Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0034s] [ 75%] 2025-09-07T09:35:01.0416927Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0033s] [ 75%] 2025-09-07T09:35:01.0417211Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0033s] [ 75%] 2025-09-07T09:35:01.0417483Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0033s] [ 75%] 2025-09-07T09:35:01.0417741Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 75%] 2025-09-07T09:35:01.0417997Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 75%] 2025-09-07T09:35:01.0418250Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0031s] [ 75%] 2025-09-07T09:35:01.0418529Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 75%] 2025-09-07T09:35:01.0418796Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0030s] [ 75%] 2025-09-07T09:35:01.0419107Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0030s] [ 75%] 2025-09-07T09:35:01.0419364Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 75%] 2025-09-07T09:35:01.0419621Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0034s] [ 75%] 2025-09-07T09:35:01.0419873Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0035s] [ 75%] 2025-09-07T09:35:01.0421289Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0034s] [ 75%] 2025-09-07T09:35:01.0421545Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0034s] [ 75%] 2025-09-07T09:35:01.0421800Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 75%] 2025-09-07T09:35:01.0422052Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 75%] 2025-09-07T09:35:01.0422308Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 75%] 2025-09-07T09:35:01.0422576Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 75%] 2025-09-07T09:35:01.0422845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 75%] 2025-09-07T09:35:01.0423098Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 75%] 2025-09-07T09:35:01.0423348Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 75%] 2025-09-07T09:35:01.0423618Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 75%] 2025-09-07T09:35:01.0423886Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 75%] 2025-09-07T09:35:01.0424137Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0027s] [ 75%] 2025-09-07T09:35:01.0424389Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 75%] 2025-09-07T09:35:01.0424641Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0026s] [ 75%] 2025-09-07T09:35:01.0424895Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 75%] 2025-09-07T09:35:01.0425146Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 75%] 2025-09-07T09:35:01.0425402Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 75%] 2025-09-07T09:35:01.0425652Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 75%] 2025-09-07T09:35:01.0425906Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 75%] 2025-09-07T09:35:01.0426153Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 75%] 2025-09-07T09:35:01.0426419Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 75%] 2025-09-07T09:35:01.0427869Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 75%] 2025-09-07T09:35:01.0428130Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 75%] 2025-09-07T09:35:01.0428384Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 75%] 2025-09-07T09:35:01.0428635Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 76%] 2025-09-07T09:35:01.0428919Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 76%] 2025-09-07T09:35:01.0429187Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 76%] 2025-09-07T09:35:01.0429439Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 76%] 2025-09-07T09:35:01.0429694Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 76%] 2025-09-07T09:35:01.0429945Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0024s] [ 76%] 2025-09-07T09:35:01.0430197Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 76%] 2025-09-07T09:35:01.0430450Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0023s] [ 76%] 2025-09-07T09:35:01.0430703Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 76%] 2025-09-07T09:35:01.0430957Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 76%] 2025-09-07T09:35:01.0431211Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 76%] 2025-09-07T09:35:01.0431461Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0027s] [ 76%] 2025-09-07T09:35:01.0431738Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 76%] 2025-09-07T09:35:01.0432005Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0026s] [ 76%] 2025-09-07T09:35:01.0432257Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 76%] 2025-09-07T09:35:01.0432511Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 76%] 2025-09-07T09:35:01.0432767Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 76%] 2025-09-07T09:35:01.0434147Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 76%] 2025-09-07T09:35:01.0434413Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 76%] 2025-09-07T09:35:01.0434662Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 76%] 2025-09-07T09:35:01.0434912Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 76%] 2025-09-07T09:35:01.0435167Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 76%] 2025-09-07T09:35:01.0435422Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 76%] 2025-09-07T09:35:01.0435669Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0028s] [ 76%] 2025-09-07T09:35:01.0435924Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 76%] 2025-09-07T09:35:01.0436174Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 76%] 2025-09-07T09:35:01.0436428Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 76%] 2025-09-07T09:35:01.0436759Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 76%] 2025-09-07T09:35:01.0437044Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 76%] 2025-09-07T09:35:01.0437318Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 76%] 2025-09-07T09:35:01.0437570Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 76%] 2025-09-07T09:35:01.0437820Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 76%] 2025-09-07T09:35:01.0438072Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 76%] 2025-09-07T09:35:01.0438347Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 76%] 2025-09-07T09:35:01.0438619Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 76%] 2025-09-07T09:35:01.0438870Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 76%] 2025-09-07T09:35:01.0439122Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 76%] 2025-09-07T09:35:01.0439375Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0029s] [ 76%] 2025-09-07T09:35:01.0440748Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 76%] 2025-09-07T09:35:01.0441004Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 76%] 2025-09-07T09:35:01.0441259Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 76%] 2025-09-07T09:35:01.0441510Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 76%] 2025-09-07T09:35:01.0441763Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 76%] 2025-09-07T09:35:01.0442010Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 76%] 2025-09-07T09:35:01.0442280Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 76%] 2025-09-07T09:35:01.0442546Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 76%] 2025-09-07T09:35:01.0442802Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 76%] 2025-09-07T09:35:01.0443051Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0030s] [ 76%] 2025-09-07T09:35:01.0443303Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 77%] 2025-09-07T09:35:01.0443565Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 77%] 2025-09-07T09:35:01.0443832Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 77%] 2025-09-07T09:35:01.0444086Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 77%] 2025-09-07T09:35:01.0444341Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 77%] 2025-09-07T09:35:01.0444594Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 77%] 2025-09-07T09:35:01.0444847Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 77%] 2025-09-07T09:35:01.0445099Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 77%] 2025-09-07T09:35:01.0445352Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 77%] 2025-09-07T09:35:01.0445609Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 77%] 2025-09-07T09:35:01.0447019Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 77%] 2025-09-07T09:35:01.0447275Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 77%] 2025-09-07T09:35:01.0447552Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 77%] 2025-09-07T09:35:01.0447820Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 77%] 2025-09-07T09:35:01.0448075Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 77%] 2025-09-07T09:35:01.0448329Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 77%] 2025-09-07T09:35:01.0448585Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 77%] 2025-09-07T09:35:01.0448852Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0034s] [ 77%] 2025-09-07T09:35:01.0449122Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 77%] 2025-09-07T09:35:01.0449375Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 77%] 2025-09-07T09:35:01.0449628Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 77%] 2025-09-07T09:35:01.0449882Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 77%] 2025-09-07T09:35:01.0450137Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 77%] 2025-09-07T09:35:01.0450386Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 77%] 2025-09-07T09:35:01.0450638Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 77%] 2025-09-07T09:35:01.0450890Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 77%] 2025-09-07T09:35:01.0451141Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 77%] 2025-09-07T09:35:01.0451396Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 77%] 2025-09-07T09:35:01.0451663Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 77%] 2025-09-07T09:35:01.0451930Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0024s] [ 77%] 2025-09-07T09:35:01.0452183Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 77%] 2025-09-07T09:35:01.0453539Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0023s] [ 77%] 2025-09-07T09:35:01.0453797Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 77%] 2025-09-07T09:35:01.0454066Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 77%] 2025-09-07T09:35:01.0454334Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 77%] 2025-09-07T09:35:01.0454583Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0027s] [ 77%] 2025-09-07T09:35:01.0454833Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 77%] 2025-09-07T09:35:01.0455084Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0026s] [ 77%] 2025-09-07T09:35:01.0455336Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 77%] 2025-09-07T09:35:01.0455588Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 77%] 2025-09-07T09:35:01.0455840Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 77%] 2025-09-07T09:35:01.0456091Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 77%] 2025-09-07T09:35:01.0456341Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 77%] 2025-09-07T09:35:01.0456672Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0023s] [ 77%] 2025-09-07T09:35:01.0456947Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 77%] 2025-09-07T09:35:01.0457215Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 77%] 2025-09-07T09:35:01.0457469Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 77%] 2025-09-07T09:35:01.0457720Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0028s] [ 77%] 2025-09-07T09:35:01.0457971Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 78%] 2025-09-07T09:35:01.0458239Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 78%] 2025-09-07T09:35:01.0458506Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 78%] 2025-09-07T09:35:01.0459937Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 78%] 2025-09-07T09:35:01.0460197Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 78%] 2025-09-07T09:35:01.0460446Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0024s] [ 78%] 2025-09-07T09:35:01.0460696Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 78%] 2025-09-07T09:35:01.0460946Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0023s] [ 78%] 2025-09-07T09:35:01.0461197Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 78%] 2025-09-07T09:35:01.0461451Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 78%] 2025-09-07T09:35:01.0461707Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 78%] 2025-09-07T09:35:01.0461955Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0027s] [ 78%] 2025-09-07T09:35:01.0462224Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 78%] 2025-09-07T09:35:01.0462485Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0026s] [ 78%] 2025-09-07T09:35:01.0462737Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 78%] 2025-09-07T09:35:01.0462987Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 78%] 2025-09-07T09:35:01.0463240Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 78%] 2025-09-07T09:35:01.0463501Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 78%] 2025-09-07T09:35:01.0463765Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 78%] 2025-09-07T09:35:01.0464013Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 78%] 2025-09-07T09:35:01.0464264Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 78%] 2025-09-07T09:35:01.0464516Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 78%] 2025-09-07T09:35:01.0464769Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 78%] 2025-09-07T09:35:01.0465019Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0028s] [ 78%] 2025-09-07T09:35:01.0466356Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 78%] 2025-09-07T09:35:01.0466668Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 78%] 2025-09-07T09:35:01.0466918Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 78%] 2025-09-07T09:35:01.0467172Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 78%] 2025-09-07T09:35:01.0467453Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 78%] 2025-09-07T09:35:01.0467719Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0024s] [ 78%] 2025-09-07T09:35:01.0467972Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 78%] 2025-09-07T09:35:01.0468222Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0023s] [ 78%] 2025-09-07T09:35:01.0468475Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 78%] 2025-09-07T09:35:01.0468749Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 78%] 2025-09-07T09:35:01.0469022Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 78%] 2025-09-07T09:35:01.0469272Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 78%] 2025-09-07T09:35:01.0469527Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 78%] 2025-09-07T09:35:01.0469778Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0029s] [ 78%] 2025-09-07T09:35:01.0470030Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 78%] 2025-09-07T09:35:01.0470281Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 78%] 2025-09-07T09:35:01.0470534Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 78%] 2025-09-07T09:35:01.0470782Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 78%] 2025-09-07T09:35:01.0471031Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 78%] 2025-09-07T09:35:01.0471278Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 78%] 2025-09-07T09:35:01.0471541Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 78%] 2025-09-07T09:35:01.0472920Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0039s] [ 78%] 2025-09-07T09:35:01.0473174Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 78%] 2025-09-07T09:35:01.0473423Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0030s] [ 78%] 2025-09-07T09:35:01.0473674Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 79%] 2025-09-07T09:35:01.0473935Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 79%] 2025-09-07T09:35:01.0474210Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 79%] 2025-09-07T09:35:01.0474464Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.5623s] [ 79%] 2025-09-07T09:35:01.0474723Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0087s] [ 79%] 2025-09-07T09:35:01.0474978Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0177s] [ 79%] 2025-09-07T09:35:01.0475234Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0608s] [ 79%] 2025-09-07T09:35:01.0475487Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0735s] [ 79%] 2025-09-07T09:35:01.0475742Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0550s] [ 79%] 2025-09-07T09:35:01.0475998Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0460s] [ 79%] 2025-09-07T09:35:01.0476256Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0475s] [ 79%] 2025-09-07T09:35:01.0476670Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0580s] [ 79%] 2025-09-07T09:35:01.0476952Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0628s] [ 79%] 2025-09-07T09:35:01.0477227Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0692s] [ 79%] 2025-09-07T09:35:01.0477482Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0655s] [ 79%] 2025-09-07T09:35:01.0477737Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0115s] [ 79%] 2025-09-07T09:35:01.0477995Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0116s] [ 79%] 2025-09-07T09:35:01.0479417Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0670s] [ 79%] 2025-09-07T09:35:01.0479696Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0547s] [ 79%] 2025-09-07T09:35:01.0479946Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0752s] [ 79%] 2025-09-07T09:35:01.0480201Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0625s] [ 79%] 2025-09-07T09:35:01.0480456Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0603s] [ 79%] 2025-09-07T09:35:01.0480716Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0674s] [ 79%] 2025-09-07T09:35:01.0480969Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0750s] [ 79%] 2025-09-07T09:35:01.0481221Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0564s] [ 79%] 2025-09-07T09:35:01.0481473Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0085s] [ 79%] 2025-09-07T09:35:01.0481727Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0073s] [ 79%] 2025-09-07T09:35:01.0481979Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 79%] 2025-09-07T09:35:01.0482249Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 79%] 2025-09-07T09:35:01.0482514Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0031s] [ 79%] 2025-09-07T09:35:01.0482768Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 79%] 2025-09-07T09:35:01.0483021Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0030s] [ 79%] 2025-09-07T09:35:01.0483274Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0030s] [ 79%] 2025-09-07T09:35:01.0483540Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0037s] [ 79%] 2025-09-07T09:35:01.0483810Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0037s] [ 79%] 2025-09-07T09:35:01.0484059Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0037s] [ 79%] 2025-09-07T09:35:01.0484312Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0037s] [ 79%] 2025-09-07T09:35:01.0484563Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0036s] [ 79%] 2025-09-07T09:35:01.0485915Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0035s] [ 79%] 2025-09-07T09:35:01.0486174Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 79%] 2025-09-07T09:35:01.0486429Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0033s] [ 79%] 2025-09-07T09:35:01.0486756Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0032s] [ 79%] 2025-09-07T09:35:01.0487007Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 79%] 2025-09-07T09:35:01.0487255Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0031s] [ 79%] 2025-09-07T09:35:01.0487536Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 79%] 2025-09-07T09:35:01.0487806Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0038s] [ 79%] 2025-09-07T09:35:01.0488061Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0037s] [ 79%] 2025-09-07T09:35:01.0488314Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0038s] [ 79%] 2025-09-07T09:35:01.0488567Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0038s] [ 80%] 2025-09-07T09:35:01.0488848Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0036s] [ 80%] 2025-09-07T09:35:01.0489115Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0036s] [ 80%] 2025-09-07T09:35:01.0489366Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 80%] 2025-09-07T09:35:01.0489623Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0032s] [ 80%] 2025-09-07T09:35:01.0489873Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0032s] [ 80%] 2025-09-07T09:35:01.0490126Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 80%] 2025-09-07T09:35:01.0490376Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0029s] [ 80%] 2025-09-07T09:35:01.0490632Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 80%] 2025-09-07T09:35:01.0490886Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0037s] [ 80%] 2025-09-07T09:35:01.0492249Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 80%] 2025-09-07T09:35:01.0492523Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0030s] [ 80%] 2025-09-07T09:35:01.0492788Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 80%] 2025-09-07T09:35:01.0493039Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 80%] 2025-09-07T09:35:01.0493292Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 80%] 2025-09-07T09:35:01.0493546Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 80%] 2025-09-07T09:35:01.0493822Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 80%] 2025-09-07T09:35:01.0494082Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 80%] 2025-09-07T09:35:01.0494332Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 80%] 2025-09-07T09:35:01.0494579Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 80%] 2025-09-07T09:35:01.0494831Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 80%] 2025-09-07T09:35:01.0495083Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 80%] 2025-09-07T09:35:01.0495338Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 80%] 2025-09-07T09:35:01.0495587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 80%] 2025-09-07T09:35:01.0495839Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 80%] 2025-09-07T09:35:01.0496089Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 80%] 2025-09-07T09:35:01.0496341Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 80%] 2025-09-07T09:35:01.0496670Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [1.0977s] [ 80%] 2025-09-07T09:35:01.0496941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0329s] [ 80%] 2025-09-07T09:35:01.0497191Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0712s] [ 80%] 2025-09-07T09:35:01.0497441Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0486s] [ 80%] 2025-09-07T09:35:01.0498801Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0713s] [ 80%] 2025-09-07T09:35:01.0499118Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0492s] [ 80%] 2025-09-07T09:35:01.0499390Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0715s] [ 80%] 2025-09-07T09:35:01.0499647Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0649s] [ 80%] 2025-09-07T09:35:01.0499897Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0732s] [ 80%] 2025-09-07T09:35:01.0500153Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0745s] [ 80%] 2025-09-07T09:35:01.0500404Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0689s] [ 80%] 2025-09-07T09:35:01.0500657Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0864s] [ 80%] 2025-09-07T09:35:01.0500908Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0688s] [ 80%] 2025-09-07T09:35:01.0501164Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0643s] [ 80%] 2025-09-07T09:35:01.0501415Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0676s] [ 80%] 2025-09-07T09:35:01.0501670Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0570s] [ 80%] 2025-09-07T09:35:01.0501931Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0038s] [ 80%] 2025-09-07T09:35:01.0502195Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0040s] [ 80%] 2025-09-07T09:35:01.0502448Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0390s] [ 80%] 2025-09-07T09:35:01.0502701Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0533s] [ 80%] 2025-09-07T09:35:01.0502953Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0848s] [ 80%] 2025-09-07T09:35:01.0503218Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0449s] [ 81%] 2025-09-07T09:35:01.0503485Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0215s] [ 81%] 2025-09-07T09:35:01.0503737Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0230s] [ 81%] 2025-09-07T09:35:01.0505078Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 81%] 2025-09-07T09:35:01.0505337Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 81%] 2025-09-07T09:35:01.0505588Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 81%] 2025-09-07T09:35:01.0505839Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 81%] 2025-09-07T09:35:01.0506087Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 81%] 2025-09-07T09:35:01.0506339Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 81%] 2025-09-07T09:35:01.0506670Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 81%] 2025-09-07T09:35:01.0506926Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 81%] 2025-09-07T09:35:01.0507206Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 81%] 2025-09-07T09:35:01.0507474Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 81%] 2025-09-07T09:35:01.0507723Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 81%] 2025-09-07T09:35:01.0507973Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 81%] 2025-09-07T09:35:01.0508223Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 81%] 2025-09-07T09:35:01.0508492Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 81%] 2025-09-07T09:35:01.0508756Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 81%] 2025-09-07T09:35:01.0509008Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 81%] 2025-09-07T09:35:01.0509258Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 81%] 2025-09-07T09:35:01.0509509Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 81%] 2025-09-07T09:35:01.0509758Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 81%] 2025-09-07T09:35:01.0510011Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0032s] [ 81%] 2025-09-07T09:35:01.0510258Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0032s] [ 81%] 2025-09-07T09:35:01.0511621Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 81%] 2025-09-07T09:35:01.0511868Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0031s] [ 81%] 2025-09-07T09:35:01.0512118Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 81%] 2025-09-07T09:35:01.0512388Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 81%] 2025-09-07T09:35:01.0512661Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 81%] 2025-09-07T09:35:01.0512909Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0034s] [ 81%] 2025-09-07T09:35:01.0513157Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 81%] 2025-09-07T09:35:01.0513407Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 81%] 2025-09-07T09:35:01.0513670Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 81%] 2025-09-07T09:35:01.0513932Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 81%] 2025-09-07T09:35:01.0514186Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 81%] 2025-09-07T09:35:01.0514434Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0028s] [ 81%] 2025-09-07T09:35:01.0514687Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 81%] 2025-09-07T09:35:01.0514932Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0026s] [ 81%] 2025-09-07T09:35:01.0515181Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 81%] 2025-09-07T09:35:01.0515428Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 81%] 2025-09-07T09:35:01.0515680Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 81%] 2025-09-07T09:35:01.0515924Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 81%] 2025-09-07T09:35:01.0516171Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 81%] 2025-09-07T09:35:01.0516427Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 81%] 2025-09-07T09:35:01.0517856Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 81%] 2025-09-07T09:35:01.0518111Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 81%] 2025-09-07T09:35:01.0518362Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 81%] 2025-09-07T09:35:01.0518612Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 81%] 2025-09-07T09:35:01.0518891Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 82%] 2025-09-07T09:35:01.0519155Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0029s] [ 82%] 2025-09-07T09:35:01.0519403Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0030s] [ 82%] 2025-09-07T09:35:01.0519653Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 82%] 2025-09-07T09:35:01.0519906Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 82%] 2025-09-07T09:35:01.0520153Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 82%] 2025-09-07T09:35:01.0520400Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 82%] 2025-09-07T09:35:01.0520644Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 82%] 2025-09-07T09:35:01.0520893Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 82%] 2025-09-07T09:35:01.0521142Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 82%] 2025-09-07T09:35:01.0521394Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 82%] 2025-09-07T09:35:01.0521639Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0028s] [ 82%] 2025-09-07T09:35:01.0521905Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 82%] 2025-09-07T09:35:01.0522171Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 82%] 2025-09-07T09:35:01.0522421Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 82%] 2025-09-07T09:35:01.0522671Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 82%] 2025-09-07T09:35:01.0522919Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 82%] 2025-09-07T09:35:01.0524266Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 82%] 2025-09-07T09:35:01.0524530Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 82%] 2025-09-07T09:35:01.0524775Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 82%] 2025-09-07T09:35:01.0525024Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 82%] 2025-09-07T09:35:01.0525273Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 82%] 2025-09-07T09:35:01.0525527Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 82%] 2025-09-07T09:35:01.0525773Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 82%] 2025-09-07T09:35:01.0526021Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 82%] 2025-09-07T09:35:01.0526267Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0031s] [ 82%] 2025-09-07T09:35:01.0526579Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 82%] 2025-09-07T09:35:01.0526828Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 82%] 2025-09-07T09:35:01.0527100Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 82%] 2025-09-07T09:35:01.0527365Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 82%] 2025-09-07T09:35:01.0527618Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 82%] 2025-09-07T09:35:01.0527871Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 82%] 2025-09-07T09:35:01.0528120Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 82%] 2025-09-07T09:35:01.0528388Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 82%] 2025-09-07T09:35:01.0528657Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 82%] 2025-09-07T09:35:01.0528904Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 82%] 2025-09-07T09:35:01.0529153Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 82%] 2025-09-07T09:35:01.0530495Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0026s] [ 82%] 2025-09-07T09:35:01.0530750Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 82%] 2025-09-07T09:35:01.0531000Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 82%] 2025-09-07T09:35:01.0531252Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 82%] 2025-09-07T09:35:01.0531497Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 82%] 2025-09-07T09:35:01.0531745Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 82%] 2025-09-07T09:35:01.0531989Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 82%] 2025-09-07T09:35:01.0532261Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 82%] 2025-09-07T09:35:01.0532522Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0037s] [ 82%] 2025-09-07T09:35:01.0532775Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 82%] 2025-09-07T09:35:01.0533023Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0030s] [ 82%] 2025-09-07T09:35:01.0533273Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 83%] 2025-09-07T09:35:01.0533532Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 83%] 2025-09-07T09:35:01.0533791Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_103_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 83%] 2025-09-07T09:35:01.0534048Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0411s] [ 83%] 2025-09-07T09:35:01.0534307Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0132s] [ 83%] 2025-09-07T09:35:01.0534564Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0129s] [ 83%] 2025-09-07T09:35:01.0534819Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0124s] [ 83%] 2025-09-07T09:35:01.0535072Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0154s] [ 83%] 2025-09-07T09:35:01.0535328Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0153s] [ 83%] 2025-09-07T09:35:01.0535666Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0046s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 83%] 2025-09-07T09:35:01.0537161Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 83%] 2025-09-07T09:35:01.0537528Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 83%] 2025-09-07T09:35:01.0537873Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0034s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 83%] 2025-09-07T09:35:01.0538201Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 83%] 2025-09-07T09:35:01.0538526Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 83%] 2025-09-07T09:35:01.0538810Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0101s] [ 83%] 2025-09-07T09:35:01.0539159Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0133s] [ 83%] 2025-09-07T09:35:01.0539414Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0134s] [ 83%] 2025-09-07T09:35:01.0539669Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0194s] [ 83%] 2025-09-07T09:35:01.0539922Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0416s] [ 83%] 2025-09-07T09:35:01.0540177Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0526s] [ 83%] 2025-09-07T09:35:01.0540503Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0416s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 83%] 2025-09-07T09:35:01.0540832Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 83%] 2025-09-07T09:35:01.0541159Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 83%] 2025-09-07T09:35:01.0541484Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 83%] 2025-09-07T09:35:01.0541825Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0130s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 83%] 2025-09-07T09:35:01.0542165Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 83%] 2025-09-07T09:35:01.0542422Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0060s] [ 83%] 2025-09-07T09:35:01.0542681Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0061s] [ 83%] 2025-09-07T09:35:01.0542947Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0270s] [ 83%] 2025-09-07T09:35:01.0544332Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0343s] [ 83%] 2025-09-07T09:35:01.0544591Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0445s] [ 83%] 2025-09-07T09:35:01.0544849Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0340s] [ 83%] 2025-09-07T09:35:01.0545178Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0257s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 83%] 2025-09-07T09:35:01.0545506Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 83%] 2025-09-07T09:35:01.0545830Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0152s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 83%] 2025-09-07T09:35:01.0546155Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0012s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 83%] 2025-09-07T09:35:01.0546555Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 83%] 2025-09-07T09:35:01.0546881Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 83%] 2025-09-07T09:35:01.0547176Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0068s] [ 83%] 2025-09-07T09:35:01.0547448Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0278s] [ 83%] 2025-09-07T09:35:01.0547701Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0375s] [ 83%] 2025-09-07T09:35:01.0547954Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0533s] [ 83%] 2025-09-07T09:35:01.0548205Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0390s] [ 83%] 2025-09-07T09:35:01.0548483Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0350s] [ 83%] 2025-09-07T09:35:01.0548832Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0282s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 83%] 2025-09-07T09:35:01.0549163Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 83%] 2025-09-07T09:35:01.0549489Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 83%] 2025-09-07T09:35:01.0549812Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0550133Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0052s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0551571Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0551831Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0064s] [ 84%] 2025-09-07T09:35:01.0552085Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0062s] [ 84%] 2025-09-07T09:35:01.0552357Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0062s] [ 84%] 2025-09-07T09:35:01.0552624Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0062s] [ 84%] 2025-09-07T09:35:01.0552876Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0061s] [ 84%] 2025-09-07T09:35:01.0553128Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0061s] [ 84%] 2025-09-07T09:35:01.0553456Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0553795Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0554132Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0554457Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0554781Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0555105Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0555358Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0069s] [ 84%] 2025-09-07T09:35:01.0555614Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0067s] [ 84%] 2025-09-07T09:35:01.0555864Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0073s] [ 84%] 2025-09-07T09:35:01.0556118Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0067s] [ 84%] 2025-09-07T09:35:01.0556369Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0067s] [ 84%] 2025-09-07T09:35:01.0556727Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0067s] [ 84%] 2025-09-07T09:35:01.0557072Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0557397Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0558840Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0559185Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0559525Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0559850Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0560107Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.2039s] [ 84%] 2025-09-07T09:35:01.0560365Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0276s] [ 84%] 2025-09-07T09:35:01.0560616Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0119s] [ 84%] 2025-09-07T09:35:01.0560870Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0359s] [ 84%] 2025-09-07T09:35:01.0561125Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0409s] [ 84%] 2025-09-07T09:35:01.0561379Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0406s] [ 84%] 2025-09-07T09:35:01.0561707Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0229s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0562052Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0562389Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0182s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0562715Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0563039Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0563376Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0015s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0563641Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0095s] [ 84%] 2025-09-07T09:35:01.0563896Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0335s] [ 84%] 2025-09-07T09:35:01.0564148Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0354s] [ 84%] 2025-09-07T09:35:01.0564405Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0646s] [ 84%] 2025-09-07T09:35:01.0565750Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0105s] [ 84%] 2025-09-07T09:35:01.0566005Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0107s] [ 84%] 2025-09-07T09:35:01.0566333Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0089s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0566732Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0567055Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 84%] 2025-09-07T09:35:01.0567401Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0567747Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0183s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0568070Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0568327Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0047s] [ 85%] 2025-09-07T09:35:01.0568609Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0150s] [ 85%] 2025-09-07T09:35:01.0568882Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0190s] [ 85%] 2025-09-07T09:35:01.0569138Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0150s] [ 85%] 2025-09-07T09:35:01.0569391Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0175s] [ 85%] 2025-09-07T09:35:01.0569647Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0174s] [ 85%] 2025-09-07T09:35:01.0569978Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0085s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0570309Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0570636Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0203s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0570961Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0571286Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0571625Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0012s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0572995Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0044s] [ 85%] 2025-09-07T09:35:01.0573257Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0073s] [ 85%] 2025-09-07T09:35:01.0573515Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0074s] [ 85%] 2025-09-07T09:35:01.0573785Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0147s] [ 85%] 2025-09-07T09:35:01.0574051Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0634s] [ 85%] 2025-09-07T09:35:01.0574305Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0460s] [ 85%] 2025-09-07T09:35:01.0574633Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0555s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0574962Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0575289Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0575616Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0575941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0411s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0576265Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0576585Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 85%] 2025-09-07T09:35:01.0576860Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0032s] [ 85%] 2025-09-07T09:35:01.0577130Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0031s] [ 85%] 2025-09-07T09:35:01.0577385Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 85%] 2025-09-07T09:35:01.0577639Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0030s] [ 85%] 2025-09-07T09:35:01.0577895Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0030s] [ 85%] 2025-09-07T09:35:01.0578248Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0578594Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0578920Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0580410Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0580738Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0581067Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0581327Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 85%] 2025-09-07T09:35:01.0581584Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0033s] [ 85%] 2025-09-07T09:35:01.0581836Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0033s] [ 85%] 2025-09-07T09:35:01.0582109Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 85%] 2025-09-07T09:35:01.0582375Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 85%] 2025-09-07T09:35:01.0582628Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 85%] 2025-09-07T09:35:01.0582953Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0583282Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0583618Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 85%] 2025-09-07T09:35:01.0583957Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0584282Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0584605Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0584861Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0033s] [ 86%] 2025-09-07T09:35:01.0585117Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 86%] 2025-09-07T09:35:01.0585369Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0031s] [ 86%] 2025-09-07T09:35:01.0585627Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 86%] 2025-09-07T09:35:01.0585881Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0029s] [ 86%] 2025-09-07T09:35:01.0587291Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 86%] 2025-09-07T09:35:01.0587654Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0588011Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0588334Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0588659Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0589015Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0589384Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0589643Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 86%] 2025-09-07T09:35:01.0589900Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 86%] 2025-09-07T09:35:01.0590151Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0036s] [ 86%] 2025-09-07T09:35:01.0590405Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0033s] [ 86%] 2025-09-07T09:35:01.0590655Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0029s] [ 86%] 2025-09-07T09:35:01.0590907Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 86%] 2025-09-07T09:35:01.0591235Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0591560Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0591893Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0592228Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0592547Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0592869Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0593137Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.6648s] [ 86%] 2025-09-07T09:35:01.0594606Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0771s] [ 86%] 2025-09-07T09:35:01.0594862Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0591s] [ 86%] 2025-09-07T09:35:01.0595119Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0780s] [ 86%] 2025-09-07T09:35:01.0595374Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0684s] [ 86%] 2025-09-07T09:35:01.0595628Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0754s] [ 86%] 2025-09-07T09:35:01.0595955Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0755s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0596285Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0070s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0596682Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0208s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0597008Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0597360Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0597703Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0597958Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0046s] [ 86%] 2025-09-07T09:35:01.0598239Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0571s] [ 86%] 2025-09-07T09:35:01.0598509Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0604s] [ 86%] 2025-09-07T09:35:01.0598779Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.1311s] [ 86%] 2025-09-07T09:35:01.0599032Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0072s] [ 86%] 2025-09-07T09:35:01.0599287Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0043s] [ 86%] 2025-09-07T09:35:01.0599614Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0610s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0599940Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0600265Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0050s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 86%] 2025-09-07T09:35:01.0601811Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0602142Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0507s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0602465Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0015s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0602746Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0199s] [ 87%] 2025-09-07T09:35:01.0603020Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.1805s] [ 87%] 2025-09-07T09:35:01.0603279Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.1757s] [ 87%] 2025-09-07T09:35:01.0603533Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.2030s] [ 87%] 2025-09-07T09:35:01.0603801Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.1959s] [ 87%] 2025-09-07T09:35:01.0604069Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.2340s] [ 87%] 2025-09-07T09:35:01.0604399Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.2069s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0604731Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0019s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0605058Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0107s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0605385Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0605711Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0606039Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0606295Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0164s] [ 87%] 2025-09-07T09:35:01.0606621Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.1363s] [ 87%] 2025-09-07T09:35:01.0606901Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0914s] [ 87%] 2025-09-07T09:35:01.0607172Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.2552s] [ 87%] 2025-09-07T09:35:01.0607425Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.1963s] [ 87%] 2025-09-07T09:35:01.0607678Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.2138s] [ 87%] 2025-09-07T09:35:01.0609214Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.1178s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0609568Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0016s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0609914Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0610241Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0014s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0610567Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0567s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0610892Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0070s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0611148Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0102s] [ 87%] 2025-09-07T09:35:01.0611407Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0102s] [ 87%] 2025-09-07T09:35:01.0611660Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.1402s] [ 87%] 2025-09-07T09:35:01.0611912Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0862s] [ 87%] 2025-09-07T09:35:01.0612180Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0736s] [ 87%] 2025-09-07T09:35:01.0612448Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0729s] [ 87%] 2025-09-07T09:35:01.0612777Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0665s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0613103Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0613440Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0090s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0613776Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0614098Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0614423Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0614681Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0114s] [ 87%] 2025-09-07T09:35:01.0614937Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0618s] [ 87%] 2025-09-07T09:35:01.0616351Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0623s] [ 87%] 2025-09-07T09:35:01.0616692Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0858s] [ 87%] 2025-09-07T09:35:01.0616945Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0490s] [ 87%] 2025-09-07T09:35:01.0617200Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0492s] [ 87%] 2025-09-07T09:35:01.0617548Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0380s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0617893Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0618218Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 87%] 2025-09-07T09:35:01.0618538Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0618880Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0042s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0619296Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0619552Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0104s] [ 88%] 2025-09-07T09:35:01.0619810Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0388s] [ 88%] 2025-09-07T09:35:01.0620064Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0280s] [ 88%] 2025-09-07T09:35:01.0620319Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0489s] [ 88%] 2025-09-07T09:35:01.0620572Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0510s] [ 88%] 2025-09-07T09:35:01.0620825Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0414s] [ 88%] 2025-09-07T09:35:01.0621152Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0376s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0621479Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0621818Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0068s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0622153Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0623707Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0624036Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0624310Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0116s] [ 88%] 2025-09-07T09:35:01.0624587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0310s] [ 88%] 2025-09-07T09:35:01.0624839Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.1602s] [ 88%] 2025-09-07T09:35:01.0625092Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.2216s] [ 88%] 2025-09-07T09:35:01.0625341Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.1673s] [ 88%] 2025-09-07T09:35:01.0625593Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.1590s] [ 88%] 2025-09-07T09:35:01.0625921Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.1620s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0626248Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0012s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0626655Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0626978Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0627329Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0145s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0627669Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0627925Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0148s] [ 88%] 2025-09-07T09:35:01.0628183Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.2147s] [ 88%] 2025-09-07T09:35:01.0628437Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.2283s] [ 88%] 2025-09-07T09:35:01.0628741Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.1860s] [ 88%] 2025-09-07T09:35:01.0629012Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.2344s] [ 88%] 2025-09-07T09:35:01.0629265Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.1560s] [ 88%] 2025-09-07T09:35:01.0629592Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.1606s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0631159Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0064s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0631487Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0707s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0631814Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0016s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0632139Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0044s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0632463Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0015s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0632734Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0189s] [ 88%] 2025-09-07T09:35:01.0633003Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.2299s] [ 88%] 2025-09-07T09:35:01.0633255Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.2298s] [ 88%] 2025-09-07T09:35:01.0633506Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.2824s] [ 88%] 2025-09-07T09:35:01.0633760Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0736s] [ 88%] 2025-09-07T09:35:01.0634030Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0170s] [ 88%] 2025-09-07T09:35:01.0634367Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.1588s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0634693Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0017s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0635017Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0043s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 88%] 2025-09-07T09:35:01.0635341Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0015s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0635662Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0931s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0635987Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0016s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0636245Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0044s] [ 89%] 2025-09-07T09:35:01.0636563Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0039s] [ 89%] 2025-09-07T09:35:01.0636813Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0039s] [ 89%] 2025-09-07T09:35:01.0638263Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0037s] [ 89%] 2025-09-07T09:35:01.0638544Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0037s] [ 89%] 2025-09-07T09:35:01.0638798Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0040s] [ 89%] 2025-09-07T09:35:01.0639125Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0639472Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0012s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0639814Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0013s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0640137Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0640461Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0640784Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0641037Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0040s] [ 89%] 2025-09-07T09:35:01.0641293Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0039s] [ 89%] 2025-09-07T09:35:01.0641545Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0039s] [ 89%] 2025-09-07T09:35:01.0641800Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0037s] [ 89%] 2025-09-07T09:35:01.0642048Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0039s] [ 89%] 2025-09-07T09:35:01.0642313Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0040s] [ 89%] 2025-09-07T09:35:01.0642653Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0642980Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0643303Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0643652Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0643988Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0011s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0645494Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0645754Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 89%] 2025-09-07T09:35:01.0646012Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 89%] 2025-09-07T09:35:01.0646262Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 89%] 2025-09-07T09:35:01.0646578Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 89%] 2025-09-07T09:35:01.0646829Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0028s] [ 89%] 2025-09-07T09:35:01.0647083Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 89%] 2025-09-07T09:35:01.0647410Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0647758Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0648100Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0648421Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0648740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0649082Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0649353Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 89%] 2025-09-07T09:35:01.0649607Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 89%] 2025-09-07T09:35:01.0649855Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 89%] 2025-09-07T09:35:01.0650107Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 89%] 2025-09-07T09:35:01.0650357Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 89%] 2025-09-07T09:35:01.0650605Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 89%] 2025-09-07T09:35:01.0650928Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0651278Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0652855Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 89%] 2025-09-07T09:35:01.0653196Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0653530Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0653850Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0654102Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 90%] 2025-09-07T09:35:01.0654357Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 90%] 2025-09-07T09:35:01.0654620Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 90%] 2025-09-07T09:35:01.0654885Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 90%] 2025-09-07T09:35:01.0655157Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 90%] 2025-09-07T09:35:01.0655409Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 90%] 2025-09-07T09:35:01.0655736Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0656062Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0656385Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0010s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0656796Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0657119Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0657463Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0657733Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 90%] 2025-09-07T09:35:01.0657986Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 90%] 2025-09-07T09:35:01.0658233Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0031s] [ 90%] 2025-09-07T09:35:01.0658482Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 90%] 2025-09-07T09:35:01.0659986Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 90%] 2025-09-07T09:35:01.0660257Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 90%] 2025-09-07T09:35:01.0660583Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0660907Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0661230Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0661549Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0661871Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0662190Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0662444Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 90%] 2025-09-07T09:35:01.0662700Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0032s] [ 90%] 2025-09-07T09:35:01.0662966Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0032s] [ 90%] 2025-09-07T09:35:01.0663241Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 90%] 2025-09-07T09:35:01.0663489Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0032s] [ 90%] 2025-09-07T09:35:01.0663740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0032s] [ 90%] 2025-09-07T09:35:01.0664066Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0664406Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0664740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0665062Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0665386Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0665707Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0667241Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 90%] 2025-09-07T09:35:01.0667504Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0034s] [ 90%] 2025-09-07T09:35:01.0667756Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0033s] [ 90%] 2025-09-07T09:35:01.0668005Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 90%] 2025-09-07T09:35:01.0668252Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0031s] [ 90%] 2025-09-07T09:35:01.0668541Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0035s] [ 90%] 2025-09-07T09:35:01.0668883Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0669210Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0669533Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 90%] 2025-09-07T09:35:01.0669873Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 SKIPPED [0.0008s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 91%] 2025-09-07T09:35:01.0670211Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 SKIPPED [0.0009s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 91%] 2025-09-07T09:35:01.0670529Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 SKIPPED [0.0007s] (Will call _fill_mem_eff_dropout_mask with too many threads!) [ 91%] 2025-09-07T09:35:01.0670786Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 91%] 2025-09-07T09:35:01.0671047Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 91%] 2025-09-07T09:35:01.0671298Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 91%] 2025-09-07T09:35:01.0671551Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 91%] 2025-09-07T09:35:01.0671803Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 91%] 2025-09-07T09:35:01.0672057Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 91%] 2025-09-07T09:35:01.0672310Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0037s] [ 91%] 2025-09-07T09:35:01.0672578Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0037s] [ 91%] 2025-09-07T09:35:01.0672841Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0032s] [ 91%] 2025-09-07T09:35:01.0674338Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 91%] 2025-09-07T09:35:01.0674589Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0030s] [ 91%] 2025-09-07T09:35:01.0674846Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 91%] 2025-09-07T09:35:01.0675113Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 91%] 2025-09-07T09:35:01.0675378Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 91%] 2025-09-07T09:35:01.0675626Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 91%] 2025-09-07T09:35:01.0675877Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 91%] 2025-09-07T09:35:01.0676127Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 91%] 2025-09-07T09:35:01.0676377Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 91%] 2025-09-07T09:35:01.0676702Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 91%] 2025-09-07T09:35:01.0676956Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0033s] [ 91%] 2025-09-07T09:35:01.0677206Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0032s] [ 91%] 2025-09-07T09:35:01.0677457Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0032s] [ 91%] 2025-09-07T09:35:01.0677704Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0031s] [ 91%] 2025-09-07T09:35:01.0677983Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 91%] 2025-09-07T09:35:01.0678252Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 91%] 2025-09-07T09:35:01.0678506Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 91%] 2025-09-07T09:35:01.0678755Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 91%] 2025-09-07T09:35:01.0679005Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 91%] 2025-09-07T09:35:01.0679269Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 91%] 2025-09-07T09:35:01.0680693Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 91%] 2025-09-07T09:35:01.0680947Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 91%] 2025-09-07T09:35:01.0681203Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 91%] 2025-09-07T09:35:01.0681454Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0031s] [ 91%] 2025-09-07T09:35:01.0681705Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 91%] 2025-09-07T09:35:01.0681953Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0029s] [ 91%] 2025-09-07T09:35:01.0682206Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0030s] [ 91%] 2025-09-07T09:35:01.0682456Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 91%] 2025-09-07T09:35:01.0682709Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 91%] 2025-09-07T09:35:01.0682955Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 91%] 2025-09-07T09:35:01.0683231Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 91%] 2025-09-07T09:35:01.0683491Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 91%] 2025-09-07T09:35:01.0683740Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 91%] 2025-09-07T09:35:01.0683989Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 91%] 2025-09-07T09:35:01.0684243Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 91%] 2025-09-07T09:35:01.0684505Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0032s] [ 91%] 2025-09-07T09:35:01.0684771Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 92%] 2025-09-07T09:35:01.0685016Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0029s] [ 92%] 2025-09-07T09:35:01.0685265Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0030s] [ 92%] 2025-09-07T09:35:01.0685516Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 92%] 2025-09-07T09:35:01.0685768Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 92%] 2025-09-07T09:35:01.0687234Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 92%] 2025-09-07T09:35:01.0687486Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 92%] 2025-09-07T09:35:01.0687741Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 92%] 2025-09-07T09:35:01.0687992Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 92%] 2025-09-07T09:35:01.0688242Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 92%] 2025-09-07T09:35:01.0688533Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 92%] 2025-09-07T09:35:01.0688798Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0027s] [ 92%] 2025-09-07T09:35:01.0689049Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 92%] 2025-09-07T09:35:01.0689294Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 92%] 2025-09-07T09:35:01.0689547Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0037s] [ 92%] 2025-09-07T09:35:01.0689844Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0041s] [ 92%] 2025-09-07T09:35:01.0690114Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0031s] [ 92%] 2025-09-07T09:35:01.0690359Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 92%] 2025-09-07T09:35:01.0690607Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0037s] [ 92%] 2025-09-07T09:35:01.0690855Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0029s] [ 92%] 2025-09-07T09:35:01.0691103Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 92%] 2025-09-07T09:35:01.0691353Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 92%] 2025-09-07T09:35:01.0691608Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 92%] 2025-09-07T09:35:01.0691857Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 92%] 2025-09-07T09:35:01.0692107Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 92%] 2025-09-07T09:35:01.0693552Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0029s] [ 92%] 2025-09-07T09:35:01.0693826Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 92%] 2025-09-07T09:35:01.0694093Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 92%] 2025-09-07T09:35:01.0694347Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 92%] 2025-09-07T09:35:01.0694592Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 92%] 2025-09-07T09:35:01.0694842Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 92%] 2025-09-07T09:35:01.0695105Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 92%] 2025-09-07T09:35:01.0695367Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 92%] 2025-09-07T09:35:01.0695623Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 92%] 2025-09-07T09:35:01.0695877Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 92%] 2025-09-07T09:35:01.0696127Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 92%] 2025-09-07T09:35:01.0696377Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 92%] 2025-09-07T09:35:01.0696696Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0029s] [ 92%] 2025-09-07T09:35:01.0696947Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 92%] 2025-09-07T09:35:01.0697198Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 92%] 2025-09-07T09:35:01.0697450Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 92%] 2025-09-07T09:35:01.0697696Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0030s] [ 92%] 2025-09-07T09:35:01.0697973Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0031s] [ 92%] 2025-09-07T09:35:01.0698237Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 92%] 2025-09-07T09:35:01.0698485Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 92%] 2025-09-07T09:35:01.0698735Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0035s] [ 92%] 2025-09-07T09:35:01.0700212Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0034s] [ 92%] 2025-09-07T09:35:01.0700483Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0034s] [ 92%] 2025-09-07T09:35:01.0700758Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0035s] [ 93%] 2025-09-07T09:35:01.0701005Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0029s] [ 93%] 2025-09-07T09:35:01.0701256Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_1024_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 93%] 2025-09-07T09:35:01.0701511Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 93%] 2025-09-07T09:35:01.0701765Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 93%] 2025-09-07T09:35:01.0702014Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 93%] 2025-09-07T09:35:01.0702266Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 93%] 2025-09-07T09:35:01.0702515Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 93%] 2025-09-07T09:35:01.0702765Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 93%] 2025-09-07T09:35:01.0703018Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 93%] 2025-09-07T09:35:01.0703288Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 93%] 2025-09-07T09:35:01.0703555Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 93%] 2025-09-07T09:35:01.0703808Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 93%] 2025-09-07T09:35:01.0704057Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 93%] 2025-09-07T09:35:01.0704310Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 93%] 2025-09-07T09:35:01.0704575Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 93%] 2025-09-07T09:35:01.0704839Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 93%] 2025-09-07T09:35:01.0705087Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 93%] 2025-09-07T09:35:01.0706433Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 93%] 2025-09-07T09:35:01.0706757Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 93%] 2025-09-07T09:35:01.0707009Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 93%] 2025-09-07T09:35:01.0707259Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0031s] [ 93%] 2025-09-07T09:35:01.0707513Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 93%] 2025-09-07T09:35:01.0707763Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 93%] 2025-09-07T09:35:01.0708014Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 93%] 2025-09-07T09:35:01.0708260Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 93%] 2025-09-07T09:35:01.0708534Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 93%] 2025-09-07T09:35:01.0708801Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 93%] 2025-09-07T09:35:01.0709058Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0023s] [ 93%] 2025-09-07T09:35:01.0709308Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0023s] [ 93%] 2025-09-07T09:35:01.0709559Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0024s] [ 93%] 2025-09-07T09:35:01.0709822Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0022s] [ 93%] 2025-09-07T09:35:01.0710087Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0022s] [ 93%] 2025-09-07T09:35:01.0710337Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 93%] 2025-09-07T09:35:01.0710590Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 93%] 2025-09-07T09:35:01.0710838Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0027s] [ 93%] 2025-09-07T09:35:01.0711091Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 93%] 2025-09-07T09:35:01.0711337Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0026s] [ 93%] 2025-09-07T09:35:01.0711588Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0032s] [ 93%] 2025-09-07T09:35:01.0712937Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 93%] 2025-09-07T09:35:01.0713189Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 93%] 2025-09-07T09:35:01.0713434Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 93%] 2025-09-07T09:35:01.0713696Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 93%] 2025-09-07T09:35:01.0713955Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0023s] [ 93%] 2025-09-07T09:35:01.0714202Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 93%] 2025-09-07T09:35:01.0714454Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 93%] 2025-09-07T09:35:01.0714707Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 93%] 2025-09-07T09:35:01.0714967Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 93%] 2025-09-07T09:35:01.0715227Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 94%] 2025-09-07T09:35:01.0715472Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 94%] 2025-09-07T09:35:01.0715721Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 94%] 2025-09-07T09:35:01.0715969Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 94%] 2025-09-07T09:35:01.0716220Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 94%] 2025-09-07T09:35:01.0716465Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 94%] 2025-09-07T09:35:01.0716789Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 94%] 2025-09-07T09:35:01.0717039Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0023s] [ 94%] 2025-09-07T09:35:01.0717286Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 94%] 2025-09-07T09:35:01.0717535Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 94%] 2025-09-07T09:35:01.0717805Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 94%] 2025-09-07T09:35:01.0719166Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0027s] [ 94%] 2025-09-07T09:35:01.0719418Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 94%] 2025-09-07T09:35:01.0719665Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 94%] 2025-09-07T09:35:01.0719917Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 94%] 2025-09-07T09:35:01.0720193Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0024s] [ 94%] 2025-09-07T09:35:01.0720460Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 94%] 2025-09-07T09:35:01.0720705Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 94%] 2025-09-07T09:35:01.0720954Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 94%] 2025-09-07T09:35:01.0721201Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 94%] 2025-09-07T09:35:01.0721450Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 94%] 2025-09-07T09:35:01.0721697Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 94%] 2025-09-07T09:35:01.0721947Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 94%] 2025-09-07T09:35:01.0722196Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0027s] [ 94%] 2025-09-07T09:35:01.0722444Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 94%] 2025-09-07T09:35:01.0722689Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 94%] 2025-09-07T09:35:01.0722935Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 94%] 2025-09-07T09:35:01.0723202Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 94%] 2025-09-07T09:35:01.0723467Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 94%] 2025-09-07T09:35:01.0723715Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 94%] 2025-09-07T09:35:01.0723964Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 94%] 2025-09-07T09:35:01.0724214Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0023s] [ 94%] 2025-09-07T09:35:01.0725566Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0023s] [ 94%] 2025-09-07T09:35:01.0725831Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 94%] 2025-09-07T09:35:01.0726084Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 94%] 2025-09-07T09:35:01.0726332Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 94%] 2025-09-07T09:35:01.0726665Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 94%] 2025-09-07T09:35:01.0726912Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0026s] [ 94%] 2025-09-07T09:35:01.0727163Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 94%] 2025-09-07T09:35:01.0727413Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0027s] [ 94%] 2025-09-07T09:35:01.0727664Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 94%] 2025-09-07T09:35:01.0727911Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 94%] 2025-09-07T09:35:01.0728158Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 94%] 2025-09-07T09:35:01.0728432Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 94%] 2025-09-07T09:35:01.0728695Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 94%] 2025-09-07T09:35:01.0728944Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 94%] 2025-09-07T09:35:01.0729195Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 94%] 2025-09-07T09:35:01.0729443Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 94%] 2025-09-07T09:35:01.0729707Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 95%] 2025-09-07T09:35:01.0729974Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 95%] 2025-09-07T09:35:01.0730222Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_103_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 95%] 2025-09-07T09:35:01.0730476Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 95%] 2025-09-07T09:35:01.0731834Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 95%] 2025-09-07T09:35:01.0732085Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 95%] 2025-09-07T09:35:01.0732337Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 95%] 2025-09-07T09:35:01.0732587Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0030s] [ 95%] 2025-09-07T09:35:01.0732841Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 95%] 2025-09-07T09:35:01.0733097Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 95%] 2025-09-07T09:35:01.0733353Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0036s] [ 95%] 2025-09-07T09:35:01.0733621Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0034s] [ 95%] 2025-09-07T09:35:01.0733885Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0034s] [ 95%] 2025-09-07T09:35:01.0734136Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0036s] [ 95%] 2025-09-07T09:35:01.0734388Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0039s] [ 95%] 2025-09-07T09:35:01.0734641Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0037s] [ 95%] 2025-09-07T09:35:01.0734908Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0035s] [ 95%] 2025-09-07T09:35:01.0735170Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 95%] 2025-09-07T09:35:01.0735423Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 95%] 2025-09-07T09:35:01.0735670Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0030s] [ 95%] 2025-09-07T09:35:01.0735923Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 95%] 2025-09-07T09:35:01.0736174Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0043s] [ 95%] 2025-09-07T09:35:01.0736427Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0045s] [ 95%] 2025-09-07T09:35:01.0736750Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0041s] [ 95%] 2025-09-07T09:35:01.0737003Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0038s] [ 95%] 2025-09-07T09:35:01.0738348Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0035s] [ 95%] 2025-09-07T09:35:01.0738602Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0032s] [ 95%] 2025-09-07T09:35:01.0738891Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 95%] 2025-09-07T09:35:01.0739213Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 95%] 2025-09-07T09:35:01.0739463Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 95%] 2025-09-07T09:35:01.0739712Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 95%] 2025-09-07T09:35:01.0739960Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 95%] 2025-09-07T09:35:01.0740227Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 95%] 2025-09-07T09:35:01.0740494Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 95%] 2025-09-07T09:35:01.0740748Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 95%] 2025-09-07T09:35:01.0740996Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 95%] 2025-09-07T09:35:01.0741249Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 95%] 2025-09-07T09:35:01.0741496Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 95%] 2025-09-07T09:35:01.0741748Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 95%] 2025-09-07T09:35:01.0741997Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 95%] 2025-09-07T09:35:01.0742251Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 95%] 2025-09-07T09:35:01.0742499Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 95%] 2025-09-07T09:35:01.0742747Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 95%] 2025-09-07T09:35:01.0743008Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 95%] 2025-09-07T09:35:01.0743283Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 95%] 2025-09-07T09:35:01.0744635Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 95%] 2025-09-07T09:35:01.0744889Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 95%] 2025-09-07T09:35:01.0745137Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0028s] [ 95%] 2025-09-07T09:35:01.0745406Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 96%] 2025-09-07T09:35:01.0745667Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 96%] 2025-09-07T09:35:01.0745917Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0029s] [ 96%] 2025-09-07T09:35:01.0746166Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 96%] 2025-09-07T09:35:01.0746420Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 96%] 2025-09-07T09:35:01.0746754Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 96%] 2025-09-07T09:35:01.0747004Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 96%] 2025-09-07T09:35:01.0747252Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 96%] 2025-09-07T09:35:01.0747503Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 96%] 2025-09-07T09:35:01.0747754Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 96%] 2025-09-07T09:35:01.0748006Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 96%] 2025-09-07T09:35:01.0748289Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 96%] 2025-09-07T09:35:01.0748554Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 96%] 2025-09-07T09:35:01.0748804Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 96%] 2025-09-07T09:35:01.0749053Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 96%] 2025-09-07T09:35:01.0749302Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 96%] 2025-09-07T09:35:01.0749570Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 96%] 2025-09-07T09:35:01.0749833Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 96%] 2025-09-07T09:35:01.0751178Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 96%] 2025-09-07T09:35:01.0751425Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 96%] 2025-09-07T09:35:01.0751674Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 96%] 2025-09-07T09:35:01.0751923Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 96%] 2025-09-07T09:35:01.0752176Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 96%] 2025-09-07T09:35:01.0752425Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 96%] 2025-09-07T09:35:01.0752675Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 96%] 2025-09-07T09:35:01.0752922Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0029s] [ 96%] 2025-09-07T09:35:01.0753170Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 96%] 2025-09-07T09:35:01.0753444Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 96%] 2025-09-07T09:35:01.0753712Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 96%] 2025-09-07T09:35:01.0753960Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0028s] [ 96%] 2025-09-07T09:35:01.0754210Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 96%] 2025-09-07T09:35:01.0754458Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 96%] 2025-09-07T09:35:01.0754727Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 96%] 2025-09-07T09:35:01.0754990Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0032s] [ 96%] 2025-09-07T09:35:01.0755244Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 96%] 2025-09-07T09:35:01.0755493Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0030s] [ 96%] 2025-09-07T09:35:01.0755744Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0030s] [ 96%] 2025-09-07T09:35:01.0755994Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0031s] [ 96%] 2025-09-07T09:35:01.0757425Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0031s] [ 96%] 2025-09-07T09:35:01.0757683Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 96%] 2025-09-07T09:35:01.0757940Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 96%] 2025-09-07T09:35:01.0758187Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0029s] [ 96%] 2025-09-07T09:35:01.0758437Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 96%] 2025-09-07T09:35:01.0758734Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0027s] [ 96%] 2025-09-07T09:35:01.0759013Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 96%] 2025-09-07T09:35:01.0759263Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0034s] [ 96%] 2025-09-07T09:35:01.0759515Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0034s] [ 96%] 2025-09-07T09:35:01.0759764Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0034s] [ 96%] 2025-09-07T09:35:01.0760048Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0035s] [ 97%] 2025-09-07T09:35:01.0760319Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0037s] [ 97%] 2025-09-07T09:35:01.0760568Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_2048_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0034s] [ 97%] 2025-09-07T09:35:01.0760817Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 97%] 2025-09-07T09:35:01.0761069Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 97%] 2025-09-07T09:35:01.0761315Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0023s] [ 97%] 2025-09-07T09:35:01.0761563Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 97%] 2025-09-07T09:35:01.0761808Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 97%] 2025-09-07T09:35:01.0762060Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0022s] [ 97%] 2025-09-07T09:35:01.0762308Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 97%] 2025-09-07T09:35:01.0762560Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 97%] 2025-09-07T09:35:01.0763914Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0026s] [ 97%] 2025-09-07T09:35:01.0764180Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0027s] [ 97%] 2025-09-07T09:35:01.0764427Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 97%] 2025-09-07T09:35:01.0764675Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 97%] 2025-09-07T09:35:01.0764927Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 97%] 2025-09-07T09:35:01.0765193Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 97%] 2025-09-07T09:35:01.0765453Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 97%] 2025-09-07T09:35:01.0765702Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 97%] 2025-09-07T09:35:01.0765948Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0025s] [ 97%] 2025-09-07T09:35:01.0766196Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 97%] 2025-09-07T09:35:01.0766444Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0029s] [ 97%] 2025-09-07T09:35:01.0766762Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 97%] 2025-09-07T09:35:01.0767007Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 97%] 2025-09-07T09:35:01.0767254Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 97%] 2025-09-07T09:35:01.0767499Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 97%] 2025-09-07T09:35:01.0767748Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_128_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 97%] 2025-09-07T09:35:01.0768028Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 97%] 2025-09-07T09:35:01.0768293Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0023s] [ 97%] 2025-09-07T09:35:01.0768536Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0023s] [ 97%] 2025-09-07T09:35:01.0768783Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 97%] 2025-09-07T09:35:01.0770128Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0024s] [ 97%] 2025-09-07T09:35:01.0770417Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 97%] 2025-09-07T09:35:01.0770689Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 97%] 2025-09-07T09:35:01.0770941Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 97%] 2025-09-07T09:35:01.0771188Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0027s] [ 97%] 2025-09-07T09:35:01.0771438Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 97%] 2025-09-07T09:35:01.0771683Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 97%] 2025-09-07T09:35:01.0771928Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 97%] 2025-09-07T09:35:01.0772177Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 97%] 2025-09-07T09:35:01.0772425Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 97%] 2025-09-07T09:35:01.0772667Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0026s] [ 97%] 2025-09-07T09:35:01.0772913Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0026s] [ 97%] 2025-09-07T09:35:01.0773176Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0023s] [ 97%] 2025-09-07T09:35:01.0773437Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 97%] 2025-09-07T09:35:01.0773684Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 97%] 2025-09-07T09:35:01.0773931Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0029s] [ 97%] 2025-09-07T09:35:01.0774174Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 97%] 2025-09-07T09:35:01.0774454Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 98%] 2025-09-07T09:35:01.0774720Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 98%] 2025-09-07T09:35:01.0774966Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0034s] [ 98%] 2025-09-07T09:35:01.0775213Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0025s] [ 98%] 2025-09-07T09:35:01.0776628Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 98%] 2025-09-07T09:35:01.0776876Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0024s] [ 98%] 2025-09-07T09:35:01.0777119Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 98%] 2025-09-07T09:35:01.0777363Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0022s] [ 98%] 2025-09-07T09:35:01.0777611Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 98%] 2025-09-07T09:35:01.0777857Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0028s] [ 98%] 2025-09-07T09:35:01.0778105Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 98%] 2025-09-07T09:35:01.0778389Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 98%] 2025-09-07T09:35:01.0778654Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 98%] 2025-09-07T09:35:01.0778899Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 98%] 2025-09-07T09:35:01.0779201Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 98%] 2025-09-07T09:35:01.0779448Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 98%] 2025-09-07T09:35:01.0779700Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0026s] [ 98%] 2025-09-07T09:35:01.0779978Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 98%] 2025-09-07T09:35:01.0780244Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0025s] [ 98%] 2025-09-07T09:35:01.0780487Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 98%] 2025-09-07T09:35:01.0780733Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 98%] 2025-09-07T09:35:01.0780982Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 98%] 2025-09-07T09:35:01.0781231Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0030s] [ 98%] 2025-09-07T09:35:01.0781473Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0029s] [ 98%] 2025-09-07T09:35:01.0781718Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 98%] 2025-09-07T09:35:01.0783077Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0027s] [ 98%] 2025-09-07T09:35:01.0783323Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0027s] [ 98%] 2025-09-07T09:35:01.0783571Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0024s] [ 98%] 2025-09-07T09:35:01.0783845Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0025s] [ 98%] 2025-09-07T09:35:01.0784102Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0025s] [ 98%] 2025-09-07T09:35:01.0784353Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0023s] [ 98%] 2025-09-07T09:35:01.0784600Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0023s] [ 98%] 2025-09-07T09:35:01.0784850Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0024s] [ 98%] 2025-09-07T09:35:01.0785115Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 98%] 2025-09-07T09:35:01.0785380Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0027s] [ 98%] 2025-09-07T09:35:01.0785624Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0026s] [ 98%] 2025-09-07T09:35:01.0785871Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0029s] [ 98%] 2025-09-07T09:35:01.0786117Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 98%] 2025-09-07T09:35:01.0786365Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0028s] [ 98%] 2025-09-07T09:35:01.0786682Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale0_cuda_bfloat16 PASSED [0.0026s] [ 98%] 2025-09-07T09:35:01.0786929Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0024s] [ 98%] 2025-09-07T09:35:01.0787173Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16 PASSED [0.0027s] [ 98%] 2025-09-07T09:35:01.0787419Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 98%] 2025-09-07T09:35:01.0787662Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32 PASSED [0.0026s] [ 98%] 2025-09-07T09:35:01.0787944Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32 PASSED [0.0025s] [ 98%] 2025-09-07T09:35:01.0789322Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale0_cuda_bfloat16 PASSED [0.0030s] [ 98%] 2025-09-07T09:35:01.0789575Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_cuda_bfloat16 PASSED [0.0028s] [ 98%] 2025-09-07T09:35:01.0789820Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16 PASSED [0.0028s] [ 98%] 2025-09-07T09:35:01.0790070Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16 PASSED [0.0028s] [ 99%] 2025-09-07T09:35:01.0790356Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32 PASSED [0.0028s] [ 99%] 2025-09-07T09:35:01.0790629Z test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_96_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32 PASSED [0.0026s] [ 99%] 2025-09-07T09:35:01.0790868Z test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_cudnn_nested_type_nested_is_contiguous_True_cuda SKIPPED [0.0001s] (Fused SDPA was not built for this system) [ 99%] 2025-09-07T09:35:01.0791066Z test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_accuracy_type_dense_fused_kernel0_cuda PASSED [0.0016s] [ 99%] 2025-09-07T09:35:01.0791261Z test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_accuracy_type_dense_fused_kernel1_cuda PASSED [0.0013s] [ 99%] 2025-09-07T09:35:01.0791457Z test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_accuracy_type_nested_fused_kernel0_cuda PASSED [1.2628s] [ 99%] 2025-09-07T09:35:01.0791653Z test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_accuracy_type_nested_fused_kernel1_cuda PASSED [0.0062s] [ 99%] 2025-09-07T09:35:01.0791841Z test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_dense_is_contiguous_False_cuda PASSED [0.0026s] [ 99%] 2025-09-07T09:35:01.0792029Z test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_dense_is_contiguous_True_cuda PASSED [0.0012s] [ 99%] 2025-09-07T09:35:01.0792222Z test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_nested_is_contiguous_False_cuda PASSED [0.0132s] [ 99%] 2025-09-07T09:35:01.0792413Z test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_nested_is_contiguous_True_cuda PASSED [0.0133s] [ 99%] 2025-09-07T09:35:01.0792636Z test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_choice_with_determinism_warn_only_False_cuda SKIPPED [0.0005s] (skipIfRocm: test doesn't currently work on the ROCm stack) [ 99%] 2025-09-07T09:35:01.0792852Z test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_choice_with_determinism_warn_only_True_cuda SKIPPED [0.0005s] (skipIfRocm: test doesn't currently work on the ROCm stack) [ 99%] 2025-09-07T09:35:01.0793060Z test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_False_bfloat16_cuda_bfloat16 PASSED [0.0027s] [ 99%] 2025-09-07T09:35:01.0793282Z test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_False_float16_cuda_float16 PASSED [0.0017s] [ 99%] 2025-09-07T09:35:01.0793501Z test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_True_bfloat16_cuda_bfloat16 PASSED [0.0016s] [ 99%] 2025-09-07T09:35:01.0793703Z test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_True_float16_cuda_float16 PASSED [0.0017s] [ 99%] 2025-09-07T09:35:01.0794999Z test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_False_bfloat16_cuda_bfloat16 PASSED [0.0016s] [ 99%] 2025-09-07T09:35:01.0795201Z test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_False_float16_cuda_float16 PASSED [0.0017s] [ 99%] 2025-09-07T09:35:01.0795403Z test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_True_bfloat16_cuda_bfloat16 PASSED [0.0017s] [ 99%] 2025-09-07T09:35:01.0795634Z test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_True_float16_cuda_float16 PASSED [0.0016s] [ 99%] 2025-09-07T09:35:01.0795835Z test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_False_is_causal_False_cuda PASSED [0.0015s] [ 99%] 2025-09-07T09:35:01.0796020Z test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_False_is_causal_True_cuda PASSED [0.0016s] [ 99%] 2025-09-07T09:35:01.0796200Z test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_True_is_causal_False_cuda PASSED [0.0016s] [ 99%] 2025-09-07T09:35:01.0796381Z test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_True_is_causal_True_cuda PASSED [0.0017s] [ 99%] 2025-09-07T09:35:01.0796584Z test_transformers.py::TestSDPACudaOnlyCUDA::test_singelton_head_dim_stride_ne_1_cuda PASSED [0.0008s] [ 99%] 2025-09-07T09:35:01.0796719Z test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_1_shape0_cuda PASSED [0.0035s] [ 99%] 2025-09-07T09:35:01.0796851Z test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_1_shape1_cuda PASSED [0.0243s] [ 99%] 2025-09-07T09:35:01.0796980Z test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_1_shape2_cuda PASSED [0.0071s] [ 99%] 2025-09-07T09:35:01.0797109Z test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_1_shape3_cuda PASSED [0.0024s] [ 99%] 2025-09-07T09:35:01.0797238Z test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_2_shape0_cuda PASSED [0.0019s] [ 99%] 2025-09-07T09:35:01.0797366Z test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_2_shape1_cuda PASSED [0.0016s] [ 99%] 2025-09-07T09:35:01.0797610Z test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_2_shape2_cuda SKIPPED [0.0005s] (Lower right causal mask will produce NaNs in the output when seq_len_q > seq_len_kv!) [ 99%] 2025-09-07T09:35:01.0797741Z test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_2_shape3_cuda PASSED [0.0022s] [ 99%] 2025-09-07T09:35:01.0797887Z test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_1_shape0_cuda PASSED [1.1448s] [ 99%] 2025-09-07T09:35:01.0798030Z test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_1_shape1_cuda PASSED [0.0437s] [ 99%] 2025-09-07T09:35:01.0798169Z test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_1_shape2_cuda PASSED [0.0393s] [ 99%] 2025-09-07T09:35:01.0798309Z test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_1_shape3_cuda PASSED [0.0362s] [ 99%] 2025-09-07T09:35:01.0799593Z test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_2_shape0_cuda PASSED [0.0342s] [ 99%] 2025-09-07T09:35:01.0799756Z test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_2_shape1_cuda PASSED [0.0388s] [ 99%] 2025-09-07T09:35:01.0800007Z test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_2_shape2_cuda SKIPPED [0.0008s] (Lower right causal mask will produce NaNs in the output when seq_len_q > seq_len_kv!) [ 99%] 2025-09-07T09:35:01.0800148Z test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_2_shape3_cuda PASSED [0.6842s] [ 99%] 2025-09-07T09:35:01.0800265Z test_transformers.py::TestAttnBiasCUDA::test_is_causal_and_mask_fails_cuda PASSED [0.0011s] [ 99%] 2025-09-07T09:35:01.0800390Z test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape0_cuda PASSED [0.0013s] [ 99%] 2025-09-07T09:35:01.0800513Z test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape1_cuda PASSED [0.0008s] [ 99%] 2025-09-07T09:35:01.0800636Z test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape2_cuda PASSED [0.0008s] [ 99%] 2025-09-07T09:35:01.0800774Z test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape3_cuda PASSED [0.0007s] [100%] 2025-09-07T09:35:01.0800798Z 2025-09-07T09:35:01.0800999Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/test_transformers/test_transformers-bcc35b60640cc0b7.xml - 2025-09-07T09:35:01.0801078Z ============= 4147 passed, 649 skipped, 7448 deselected in 55.23s ============== 2025-09-07T09:35:01.0801472Z The following tests failed and then succeeded when run in a new process['test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda'] 2025-09-07T09:35:01.0801476Z 2025-09-07T09:35:01.0801617Z FINISHED PRINTING LOG FILE of test_transformers 1/1 (test/test-reports/test_transformers_1.1_62cb5af01563119b_.log) 2025-09-07T09:35:01.0801621Z 2025-09-07T09:35:01.0801714Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-09-07T09:35:01.0801764Z Uploading artifacts took 0.00 seconds 2025-09-07T09:35:01.3509899Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. 2025-09-07T09:35:01.3510414Z import pkg_resources 2025-09-07T09:35:02.1608268Z Running test batch 'tests to run' cost 10203.37 seconds 2025-09-07T09:35:02.1614840Z Emiting td_test_failure_stats_v2 2025-09-07T09:35:02.1618443Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1757237702_edbcd0708bcd11f0a7fb56b883cc8d63 2025-09-07T09:35:02.2182923Z /var/lib/jenkins/pytorch/tools/stats/upload_metrics.py:156: UserWarning: Error uploading metric td_test_failure_stats_v2 to DynamoDB: Unable to locate credentials 2025-09-07T09:35:02.2183287Z warn(f"Error uploading metric {metric_name} to DynamoDB: {e}") 2025-09-07T09:35:02.2184363Z inductor/test_control_flow 1/2 failed! 2025-09-07T09:35:03.0021877Z 2025-09-07T09:35:03.0022119Z real 170m6.137s 2025-09-07T09:35:03.0022428Z user 3369m57.937s 2025-09-07T09:35:03.0111478Z sys 83m8.119s 2025-09-07T09:35:03.0111647Z + sccache_epilogue 2025-09-07T09:35:03.0111849Z + echo '::group::Sccache Compilation Log' 2025-09-07T09:35:03.0119095Z ##[group]Sccache Compilation Log 2025-09-07T09:35:03.0119321Z + echo '=================== sccache compilation log ===================' 2025-09-07T09:35:03.0119575Z =================== sccache compilation log =================== 2025-09-07T09:35:03.0119942Z + python /var/lib/jenkins/pytorch/.ci/pytorch/print_sccache_log.py /var/lib/jenkins/sccache_error.log 2025-09-07T09:35:03.0120581Z + echo '=========== If your build fails, please take a look at the log above for possible reasons ===========' 2025-09-07T09:35:03.0121035Z =========== If your build fails, please take a look at the log above for possible reasons =========== 2025-09-07T09:35:03.0121618Z + sccache --show-stats 2025-09-07T09:35:03.0140241Z Compile requests 8486 2025-09-07T09:35:03.0140463Z Compile requests executed 125 2025-09-07T09:35:03.0141529Z Cache hits 18 2025-09-07T09:35:03.0141876Z Cache hits (C/C++) 18 2025-09-07T09:35:03.0142113Z Cache misses 106 2025-09-07T09:35:03.0142333Z Cache misses (C/C++) 100 2025-09-07T09:35:03.0142554Z Cache misses (HIP) 6 2025-09-07T09:35:03.0142781Z Cache hits rate 14.52 % 2025-09-07T09:35:03.0143003Z Cache hits rate (C/C++) 15.25 % 2025-09-07T09:35:03.0143241Z Cache hits rate (HIP) 0.00 % 2025-09-07T09:35:03.0143515Z Cache timeouts 0 2025-09-07T09:35:03.0143754Z Cache read errors 0 2025-09-07T09:35:03.0143972Z Forced recaches 0 2025-09-07T09:35:03.0144601Z Cache write errors 0 2025-09-07T09:35:03.0144914Z Cache errors 0 2025-09-07T09:35:03.0145130Z Compilations 106 2025-09-07T09:35:03.0145352Z Compilation failures 1 2025-09-07T09:35:03.0145596Z Non-cacheable compilations 0 2025-09-07T09:35:03.0145811Z Non-cacheable calls 64 2025-09-07T09:35:03.0146041Z Non-compilation calls 8297 2025-09-07T09:35:03.0146267Z Unsupported compiler calls 0 2025-09-07T09:35:03.0146726Z Average cache write 0.000 s 2025-09-07T09:35:03.0146970Z Average compiler 3.386 s 2025-09-07T09:35:03.0147197Z Average cache read hit 0.000 s 2025-09-07T09:35:03.0147433Z Failed distributed compilations 0 2025-09-07T09:35:03.0147590Z 2025-09-07T09:35:03.0147680Z Non-cacheable reasons: 2025-09-07T09:35:03.0147881Z -E 41 2025-09-07T09:35:03.0148111Z unknown source language 23 2025-09-07T09:35:03.0148268Z 2025-09-07T09:35:03.0148418Z Cache location Local disk: "/var/lib/jenkins/.cache/sccache" 2025-09-07T09:35:03.0148739Z Use direct/preprocessor mode? yes 2025-09-07T09:35:03.0148978Z Version (client) 0.10.0 2025-09-07T09:35:03.0149208Z Cache size 7 MiB 2025-09-07T09:35:03.0149441Z Max cache size 10 GiB 2025-09-07T09:35:03.0149688Z + sccache --stop-server 2025-09-07T09:35:03.0168422Z Stopping sccache server... 2025-09-07T09:35:03.0172164Z Compile requests 8486 2025-09-07T09:35:03.0172381Z Compile requests executed 125 2025-09-07T09:35:03.0174000Z Cache hits 18 2025-09-07T09:35:03.0174409Z Cache hits (C/C++) 18 2025-09-07T09:35:03.0174747Z Cache misses 106 2025-09-07T09:35:03.0175011Z Cache misses (C/C++) 100 2025-09-07T09:35:03.0175269Z Cache misses (HIP) 6 2025-09-07T09:35:03.0175540Z Cache hits rate 14.52 % 2025-09-07T09:35:03.0175825Z Cache hits rate (C/C++) 15.25 % 2025-09-07T09:35:03.0176090Z Cache hits rate (HIP) 0.00 % 2025-09-07T09:35:03.0176351Z Cache timeouts 0 2025-09-07T09:35:03.0176796Z Cache read errors 0 2025-09-07T09:35:03.0177049Z Forced recaches 0 2025-09-07T09:35:03.0177319Z Cache write errors 0 2025-09-07T09:35:03.0177568Z Cache errors 0 2025-09-07T09:35:03.0177816Z Compilations 106 2025-09-07T09:35:03.0178071Z Compilation failures 1 2025-09-07T09:35:03.0178336Z Non-cacheable compilations 0 2025-09-07T09:35:03.0178956Z Non-cacheable calls 64 2025-09-07T09:35:03.0180358Z Non-compilation calls 8297 2025-09-07T09:35:03.0180621Z Unsupported compiler calls 0 2025-09-07T09:35:03.0180891Z Average cache write 0.000 s 2025-09-07T09:35:03.0181164Z Average compiler 3.386 s 2025-09-07T09:35:03.0181431Z Average cache read hit 0.000 s 2025-09-07T09:35:03.0181701Z Failed distributed compilations 0 2025-09-07T09:35:03.0181880Z 2025-09-07T09:35:03.0181976Z Non-cacheable reasons: 2025-09-07T09:35:03.0182207Z -E 41 2025-09-07T09:35:03.0182476Z unknown source language 23 2025-09-07T09:35:03.0182646Z 2025-09-07T09:35:03.0182818Z Cache location Local disk: "/var/lib/jenkins/.cache/sccache" 2025-09-07T09:35:03.0183197Z Use direct/preprocessor mode? yes 2025-09-07T09:35:03.0183478Z Version (client) 0.10.0 2025-09-07T09:35:03.0183716Z Cache size 7 MiB 2025-09-07T09:35:03.0183923Z Max cache size 10 GiB 2025-09-07T09:35:03.0184170Z + echo ::endgroup:: 2025-09-07T09:35:03.0184833Z ##[endgroup] 2025-09-07T09:35:03.0237608Z ##[error]Process completed with exit code 1. 2025-09-07T09:35:03.0274464Z ##[group]Run # copy test results back to the mounted workspace, needed sudo, resulting permissions were correct 2025-09-07T09:35:03.0274860Z # copy test results back to the mounted workspace, needed sudo, resulting permissions were correct 2025-09-07T09:35:03.0275239Z docker exec -t "a39ab74a215f91c588e65e428af6eb54b050b7235261f1f4cb5d995fd65de40f" sh -c "cd ../pytorch && sudo cp -R test/test-reports ../workspace/test" 2025-09-07T09:35:03.0281286Z shell: /usr/bin/bash -e {0} 2025-09-07T09:35:03.0281400Z env: 2025-09-07T09:35:03.0281492Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:35:03.0281628Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T09:35:03.0281802Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T09:35:03.0281965Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T09:35:03.0282368Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:35:03.0282731Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:35:03.0282840Z AWS_REGION: us-east-1 2025-09-07T09:35:03.0285621Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:35:03.0285780Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:35:03.0288408Z AWS_SESSION_TOKEN: *** 2025-09-07T09:35:03.0288572Z CONTAINER_NAME: a39ab74a215f91c588e65e428af6eb54b050b7235261f1f4cb5d995fd65de40f 2025-09-07T09:35:03.0288746Z ##[endgroup] 2025-09-07T09:35:03.1775053Z ##[group]Run docker exec -t "a39ab74a215f91c588e65e428af6eb54b050b7235261f1f4cb5d995fd65de40f" sh -c "sudo chown -R 1001:1001 test" 2025-09-07T09:35:03.1775427Z docker exec -t "a39ab74a215f91c588e65e428af6eb54b050b7235261f1f4cb5d995fd65de40f" sh -c "sudo chown -R 1001:1001 test" 2025-09-07T09:35:03.1779720Z shell: /usr/bin/bash -e {0} 2025-09-07T09:35:03.1779832Z env: 2025-09-07T09:35:03.1779919Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:35:03.1780052Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T09:35:03.1780230Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T09:35:03.1780389Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T09:35:03.1780762Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:35:03.1781133Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:35:03.1781245Z AWS_REGION: us-east-1 2025-09-07T09:35:03.1781389Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:35:03.1783895Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:35:03.1785964Z AWS_SESSION_TOKEN: *** 2025-09-07T09:35:03.1786128Z CONTAINER_NAME: a39ab74a215f91c588e65e428af6eb54b050b7235261f1f4cb5d995fd65de40f 2025-09-07T09:35:03.1786419Z ##[endgroup] 2025-09-07T09:35:03.2670933Z ##[group]Run cat test/**/*_toprint.log || true 2025-09-07T09:35:03.2671082Z cat test/**/*_toprint.log || true 2025-09-07T09:35:03.2675054Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:35:03.2675191Z env: 2025-09-07T09:35:03.2675278Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:35:03.2675404Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T09:35:03.2675570Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T09:35:03.2675726Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T09:35:03.2676096Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:35:03.2676451Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:35:03.2676628Z AWS_REGION: us-east-1 2025-09-07T09:35:03.2676754Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:35:03.2676913Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:35:03.2678975Z AWS_SESSION_TOKEN: *** 2025-09-07T09:35:03.2679138Z CONTAINER_NAME: a39ab74a215f91c588e65e428af6eb54b050b7235261f1f4cb5d995fd65de40f 2025-09-07T09:35:03.2679408Z ##[endgroup] 2025-09-07T09:35:03.2745849Z cat: 'test/**/*_toprint.log': No such file or directory 2025-09-07T09:35:03.2820196Z Prepare all required actions 2025-09-07T09:35:03.2820536Z Getting action download info 2025-09-07T09:35:03.4776197Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-09-07T09:35:04.5744735Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-09-07T09:35:06.1352113Z ##[group]Run ./.github/actions/upload-test-artifacts 2025-09-07T09:35:06.1352268Z with: 2025-09-07T09:35:06.1352360Z use-gha: true 2025-09-07T09:35:06.1352513Z file-suffix: test-default-6-6-linux.rocm.gpu.gfx942.1_49774353529 2025-09-07T09:35:06.1352681Z s3-bucket: gha-artifacts 2025-09-07T09:35:06.1352782Z env: 2025-09-07T09:35:06.1352872Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:35:06.1353000Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T09:35:06.1353183Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T09:35:06.1353364Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T09:35:06.1353744Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:35:06.1354110Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:35:06.1354226Z AWS_REGION: us-east-1 2025-09-07T09:35:06.1354379Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:35:06.1354534Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:35:06.1356868Z AWS_SESSION_TOKEN: *** 2025-09-07T09:35:06.1357039Z CONTAINER_NAME: a39ab74a215f91c588e65e428af6eb54b050b7235261f1f4cb5d995fd65de40f 2025-09-07T09:35:06.1357216Z ##[endgroup] 2025-09-07T09:35:06.1393893Z ##[group]Run actions/upload-artifact@v4 2025-09-07T09:35:06.1394018Z with: 2025-09-07T09:35:06.1394200Z name: test-jsons-runattempt1-test-default-6-6-linux.rocm.gpu.gfx942.1_49774353529.zip 2025-09-07T09:35:06.1394399Z retention-days: 14 2025-09-07T09:35:06.1394509Z if-no-files-found: warn 2025-09-07T09:35:06.1394615Z path: test/**/*.json 2025-09-07T09:35:06.1394716Z compression-level: 6 2025-09-07T09:35:06.1394813Z overwrite: false 2025-09-07T09:35:06.1394917Z include-hidden-files: false 2025-09-07T09:35:06.1395026Z env: 2025-09-07T09:35:06.1395115Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:35:06.1395246Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T09:35:06.1395419Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T09:35:06.1395581Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T09:35:06.1396059Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:35:06.1396424Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:35:06.1396655Z AWS_REGION: us-east-1 2025-09-07T09:35:06.1396782Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:35:06.1396934Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:35:06.1398999Z AWS_SESSION_TOKEN: *** 2025-09-07T09:35:06.1399166Z CONTAINER_NAME: a39ab74a215f91c588e65e428af6eb54b050b7235261f1f4cb5d995fd65de40f 2025-09-07T09:35:06.1399342Z ##[endgroup] 2025-09-07T09:35:06.5947989Z With the provided path, there will be 6 files uploaded 2025-09-07T09:35:06.5951128Z Artifact name is valid! 2025-09-07T09:35:06.5951617Z Root directory input is valid! 2025-09-07T09:35:06.7261344Z Beginning upload of artifact content to blob storage 2025-09-07T09:35:07.0119922Z Uploaded bytes 45591 2025-09-07T09:35:07.0642429Z Finished uploading artifact content to blob storage! 2025-09-07T09:35:07.0643146Z SHA256 digest of uploaded artifact zip is 607351877092514c15d0c91cb41ca301dee84db7c8c539844b31139dd21cb2a1 2025-09-07T09:35:07.0644024Z Finalizing artifact upload 2025-09-07T09:35:07.1496833Z Artifact test-jsons-runattempt1-test-default-6-6-linux.rocm.gpu.gfx942.1_49774353529.zip.zip successfully finalized. Artifact ID 3946808983 2025-09-07T09:35:07.1497668Z Artifact test-jsons-runattempt1-test-default-6-6-linux.rocm.gpu.gfx942.1_49774353529.zip has been successfully uploaded! Final size is 45591 bytes. Artifact ID is 3946808983 2025-09-07T09:35:07.1498322Z Artifact download URL: https://github.com/pytorch/pytorch/actions/runs/17524754565/artifacts/3946808983 2025-09-07T09:35:07.1622679Z ##[group]Run actions/upload-artifact@v4 2025-09-07T09:35:07.1622822Z with: 2025-09-07T09:35:07.1623011Z name: test-reports-runattempt1-test-default-6-6-linux.rocm.gpu.gfx942.1_49774353529.zip 2025-09-07T09:35:07.1623231Z retention-days: 14 2025-09-07T09:35:07.1623338Z if-no-files-found: ignore 2025-09-07T09:35:07.1623458Z path: test/**/*.xml test/**/*.csv 2025-09-07T09:35:07.1623579Z compression-level: 6 2025-09-07T09:35:07.1625738Z overwrite: false 2025-09-07T09:35:07.1625855Z include-hidden-files: false 2025-09-07T09:35:07.1625995Z env: 2025-09-07T09:35:07.1626082Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:35:07.1626230Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T09:35:07.1626403Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T09:35:07.1626693Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T09:35:07.1627075Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:35:07.1627442Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:35:07.1627554Z AWS_REGION: us-east-1 2025-09-07T09:35:07.1629599Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:35:07.1629761Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:35:07.1631875Z AWS_SESSION_TOKEN: *** 2025-09-07T09:35:07.1632045Z CONTAINER_NAME: a39ab74a215f91c588e65e428af6eb54b050b7235261f1f4cb5d995fd65de40f 2025-09-07T09:35:07.1632226Z ##[endgroup] 2025-09-07T09:35:07.6368628Z With the provided path, there will be 142 files uploaded 2025-09-07T09:35:07.6369057Z Artifact name is valid! 2025-09-07T09:35:07.6369248Z Root directory input is valid! 2025-09-07T09:35:07.7590647Z Beginning upload of artifact content to blob storage 2025-09-07T09:35:08.3806238Z Uploaded bytes 949885 2025-09-07T09:35:08.4328390Z Finished uploading artifact content to blob storage! 2025-09-07T09:35:08.4329758Z SHA256 digest of uploaded artifact zip is feffab698bdd8fb843ae3f248b86019b92b285d9e9632873e6b380feacfaac86 2025-09-07T09:35:08.4330484Z Finalizing artifact upload 2025-09-07T09:35:08.5303550Z Artifact test-reports-runattempt1-test-default-6-6-linux.rocm.gpu.gfx942.1_49774353529.zip.zip successfully finalized. Artifact ID 3946809035 2025-09-07T09:35:08.5304350Z Artifact test-reports-runattempt1-test-default-6-6-linux.rocm.gpu.gfx942.1_49774353529.zip has been successfully uploaded! Final size is 949885 bytes. Artifact ID is 3946809035 2025-09-07T09:35:08.5304822Z Artifact download URL: https://github.com/pytorch/pytorch/actions/runs/17524754565/artifacts/3946809035 2025-09-07T09:35:08.5439523Z ##[group]Run actions/upload-artifact@v4 2025-09-07T09:35:08.5439689Z with: 2025-09-07T09:35:08.5439860Z name: logs-runattempt1-test-default-6-6-linux.rocm.gpu.gfx942.1_49774353529.zip 2025-09-07T09:35:08.5440052Z retention-days: 14 2025-09-07T09:35:08.5440165Z if-no-files-found: ignore 2025-09-07T09:35:08.5440288Z path: usage_log.txt test/**/*.log 2025-09-07T09:35:08.5440416Z compression-level: 6 2025-09-07T09:35:08.5440525Z overwrite: false 2025-09-07T09:35:08.5440638Z include-hidden-files: false 2025-09-07T09:35:08.5440752Z env: 2025-09-07T09:35:08.5440860Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:35:08.5441002Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T09:35:08.5441191Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T09:35:08.5441480Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T09:35:08.5441950Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:35:08.5442326Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:35:08.5442447Z AWS_REGION: us-east-1 2025-09-07T09:35:08.5442625Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:35:08.5442788Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:35:08.5444864Z AWS_SESSION_TOKEN: *** 2025-09-07T09:35:08.5445036Z CONTAINER_NAME: a39ab74a215f91c588e65e428af6eb54b050b7235261f1f4cb5d995fd65de40f 2025-09-07T09:35:08.5445217Z ##[endgroup] 2025-09-07T09:35:08.9724985Z Multiple search paths detected. Calculating the least common ancestor of all paths 2025-09-07T09:35:08.9725473Z The least common ancestor is /home/runner/_work/pytorch/pytorch. This will be the root directory of the artifact 2025-09-07T09:35:08.9725774Z With the provided path, there will be 82 files uploaded 2025-09-07T09:35:08.9726028Z Artifact name is valid! 2025-09-07T09:35:08.9726168Z Root directory input is valid! 2025-09-07T09:35:09.0765095Z Beginning upload of artifact content to blob storage 2025-09-07T09:35:09.6895408Z Uploaded bytes 1260964 2025-09-07T09:35:09.7374525Z Finished uploading artifact content to blob storage! 2025-09-07T09:35:09.7375228Z SHA256 digest of uploaded artifact zip is f713a9f16d0c6e797ce0b694ba20be1a3f62f03cdde52cbdd84c23f261ae450a 2025-09-07T09:35:09.7375601Z Finalizing artifact upload 2025-09-07T09:35:09.8338405Z Artifact logs-runattempt1-test-default-6-6-linux.rocm.gpu.gfx942.1_49774353529.zip.zip successfully finalized. Artifact ID 3946809095 2025-09-07T09:35:09.8339851Z Artifact logs-runattempt1-test-default-6-6-linux.rocm.gpu.gfx942.1_49774353529.zip has been successfully uploaded! Final size is 1260964 bytes. Artifact ID is 3946809095 2025-09-07T09:35:09.8341047Z Artifact download URL: https://github.com/pytorch/pytorch/actions/runs/17524754565/artifacts/3946809095 2025-09-07T09:35:09.8505921Z ##[group]Run # shellcheck disable=SC2156 2025-09-07T09:35:09.8506100Z # shellcheck disable=SC2156 2025-09-07T09:35:09.8506338Z find . -iname "core.[1-9]*" -exec docker exec "${CONTAINER_NAME}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \; 2025-09-07T09:35:09.8512234Z shell: /usr/bin/bash -e {0} 2025-09-07T09:35:09.8512366Z env: 2025-09-07T09:35:09.8512471Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:35:09.8512647Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T09:35:09.8512852Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T09:35:09.8513035Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T09:35:09.8513437Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:35:09.8513936Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:35:09.8514092Z AWS_REGION: us-east-1 2025-09-07T09:35:09.8514280Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:35:09.8514447Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:35:09.8516627Z AWS_SESSION_TOKEN: *** 2025-09-07T09:35:09.8516807Z CONTAINER_NAME: a39ab74a215f91c588e65e428af6eb54b050b7235261f1f4cb5d995fd65de40f 2025-09-07T09:35:09.8516993Z ##[endgroup] 2025-09-07T09:35:10.0211779Z ##[group]Run actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 2025-09-07T09:35:10.0211972Z with: 2025-09-07T09:35:10.0212097Z name: coredumps-default-6-6-linux.rocm.gpu.gfx942.1 2025-09-07T09:35:10.0212248Z retention-days: 14 2025-09-07T09:35:10.0215886Z if-no-files-found: ignore 2025-09-07T09:35:10.0216019Z path: ./**/core.[1-9]* 2025-09-07T09:35:10.0216132Z compression-level: 6 2025-09-07T09:35:10.0216237Z overwrite: false 2025-09-07T09:35:10.0216345Z include-hidden-files: false 2025-09-07T09:35:10.0216697Z env: 2025-09-07T09:35:10.0216882Z GIT_DEFAULT_BRANCH: main 2025-09-07T09:35:10.0217021Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-09-07T09:35:10.0217206Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-09-07T09:35:10.0217376Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-09-07T09:35:10.0220048Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 992 --device /dev/dri/renderD169 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-09-07T09:35:10.0220417Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T09:35:10.0220537Z AWS_REGION: us-east-1 2025-09-07T09:35:10.0220696Z AWS_ACCESS_KEY_ID: *** 2025-09-07T09:35:10.0220854Z AWS_SECRET_ACCESS_KEY: *** 2025-09-07T09:35:10.0222927Z AWS_SESSION_TOKEN: *** 2025-09-07T09:35:10.0223099Z CONTAINER_NAME: a39ab74a215f91c588e65e428af6eb54b050b7235261f1f4cb5d995fd65de40f 2025-09-07T09:35:10.0223281Z ##[endgroup] 2025-09-07T09:35:14.4097630Z No files were found with the provided path: ./**/core.[1-9]*. No artifacts will be uploaded. 2025-09-07T09:35:14.4259658Z Post job cleanup. 2025-09-07T09:35:14.4491634Z Logging out of registry 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-09-07T09:35:14.4755433Z Post job cleanup. 2025-09-07T09:35:14.5439847Z Post job cleanup. 2025-09-07T09:35:14.5475870Z Post job cleanup. 2025-09-07T09:35:14.5958603Z [command]/usr/bin/git version 2025-09-07T09:35:14.5982152Z git version 2.51.0 2025-09-07T09:35:14.5999809Z Copying '/home/runner/.gitconfig' to '/home/runner/_work/_temp/15eb7ba9-467f-4c45-b68a-00391d8d7637/.gitconfig' 2025-09-07T09:35:14.6008007Z Temporarily overriding HOME='/home/runner/_work/_temp/15eb7ba9-467f-4c45-b68a-00391d8d7637' before making global git config changes 2025-09-07T09:35:14.6008363Z Adding repository directory to the temporary git global config as a safe directory 2025-09-07T09:35:14.6010533Z [command]/usr/bin/git config --global --add safe.directory /home/runner/_work/pytorch/pytorch 2025-09-07T09:35:14.6043488Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-09-07T09:35:14.6068826Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-09-07T09:35:14.6331129Z Entering 'android/libs/fbjni' 2025-09-07T09:35:14.6366706Z Entering 'third_party/FP16' 2025-09-07T09:35:14.6413391Z Entering 'third_party/FXdiv' 2025-09-07T09:35:14.6458179Z Entering 'third_party/NNPACK' 2025-09-07T09:35:14.6502892Z Entering 'third_party/NVTX' 2025-09-07T09:35:14.6540766Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T09:35:14.6582590Z Entering 'third_party/XNNPACK' 2025-09-07T09:35:14.6626775Z Entering 'third_party/aiter' 2025-09-07T09:35:14.6666108Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T09:35:14.6704912Z Entering 'third_party/benchmark' 2025-09-07T09:35:14.6734067Z Entering 'third_party/composable_kernel' 2025-09-07T09:35:14.6773978Z Entering 'third_party/cpp-httplib' 2025-09-07T09:35:14.6820168Z Entering 'third_party/cpuinfo' 2025-09-07T09:35:14.6854526Z Entering 'third_party/cudnn_frontend' 2025-09-07T09:35:14.6896455Z Entering 'third_party/cutlass' 2025-09-07T09:35:14.6944785Z Entering 'third_party/fbgemm' 2025-09-07T09:35:14.6992249Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T09:35:14.7022164Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T09:35:14.7070772Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T09:35:14.7101205Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T09:35:14.7134501Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T09:35:14.7165394Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T09:35:14.7198489Z Entering 'third_party/fbgemm/external/json' 2025-09-07T09:35:14.7235033Z Entering 'third_party/flash-attention' 2025-09-07T09:35:14.7265968Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T09:35:14.7304147Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T09:35:14.7339093Z Entering 'third_party/flatbuffers' 2025-09-07T09:35:14.7366656Z Entering 'third_party/fmt' 2025-09-07T09:35:14.7399370Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T09:35:14.7439023Z Entering 'third_party/gloo' 2025-09-07T09:35:14.7463793Z Entering 'third_party/googletest' 2025-09-07T09:35:14.7498774Z Entering 'third_party/ideep' 2025-09-07T09:35:14.7536030Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T09:35:14.7567686Z Entering 'third_party/ittapi' 2025-09-07T09:35:14.7597467Z Entering 'third_party/kineto' 2025-09-07T09:35:14.7625165Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T09:35:14.7655036Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T09:35:14.7681659Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T09:35:14.7713819Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T09:35:14.7741377Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T09:35:14.7763708Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T09:35:14.7793123Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T09:35:14.7820613Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T09:35:14.7856734Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T09:35:14.7883219Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T09:35:14.7910981Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T09:35:14.7934311Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T09:35:14.7962477Z Entering 'third_party/kleidiai' 2025-09-07T09:35:14.7993567Z Entering 'third_party/mimalloc' 2025-09-07T09:35:14.8026030Z Entering 'third_party/nlohmann' 2025-09-07T09:35:14.8052428Z Entering 'third_party/onnx' 2025-09-07T09:35:14.8083704Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T09:35:14.8111147Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T09:35:14.8139032Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T09:35:14.8165361Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T09:35:14.8191763Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T09:35:14.8215058Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T09:35:14.8238688Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T09:35:14.8264724Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T09:35:14.8288235Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T09:35:14.8313333Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T09:35:14.8334746Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T09:35:14.8359141Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T09:35:14.8392366Z Entering 'third_party/pocketfft' 2025-09-07T09:35:14.8422654Z Entering 'third_party/protobuf' 2025-09-07T09:35:14.8449789Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T09:35:14.8492664Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T09:35:14.8520730Z Entering 'third_party/psimd' 2025-09-07T09:35:14.8551522Z Entering 'third_party/pthreadpool' 2025-09-07T09:35:14.8580932Z Entering 'third_party/pybind11' 2025-09-07T09:35:14.8614030Z Entering 'third_party/python-peachpy' 2025-09-07T09:35:14.8646741Z Entering 'third_party/sleef' 2025-09-07T09:35:14.8673218Z Entering 'third_party/tensorpipe' 2025-09-07T09:35:14.8703283Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T09:35:14.8731238Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T09:35:14.8755697Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T09:35:14.8791724Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T09:35:14.8814877Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T09:35:14.8861359Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-09-07T09:35:14.8879603Z http.https://github.com/.extraheader 2025-09-07T09:35:14.8886172Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-09-07T09:35:14.8909795Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-09-07T09:35:14.9066700Z Entering 'android/libs/fbjni' 2025-09-07T09:35:14.9086350Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9106831Z Entering 'third_party/FP16' 2025-09-07T09:35:14.9124705Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9144773Z Entering 'third_party/FXdiv' 2025-09-07T09:35:14.9161133Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9181310Z Entering 'third_party/NNPACK' 2025-09-07T09:35:14.9200437Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9221109Z Entering 'third_party/NVTX' 2025-09-07T09:35:14.9236925Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9257024Z Entering 'third_party/VulkanMemoryAllocator' 2025-09-07T09:35:14.9282771Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9304030Z Entering 'third_party/XNNPACK' 2025-09-07T09:35:14.9317919Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9346332Z Entering 'third_party/aiter' 2025-09-07T09:35:14.9371040Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9396148Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-09-07T09:35:14.9409739Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9434159Z Entering 'third_party/benchmark' 2025-09-07T09:35:14.9452916Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9473799Z Entering 'third_party/composable_kernel' 2025-09-07T09:35:14.9491617Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9514637Z Entering 'third_party/cpp-httplib' 2025-09-07T09:35:14.9533070Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9553565Z Entering 'third_party/cpuinfo' 2025-09-07T09:35:14.9567820Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9586273Z Entering 'third_party/cudnn_frontend' 2025-09-07T09:35:14.9599731Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9627353Z Entering 'third_party/cutlass' 2025-09-07T09:35:14.9642037Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9671246Z Entering 'third_party/fbgemm' 2025-09-07T09:35:14.9681232Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9704315Z Entering 'third_party/fbgemm/external/asmjit' 2025-09-07T09:35:14.9722309Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9744875Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-09-07T09:35:14.9758306Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9779994Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-09-07T09:35:14.9797450Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9815320Z Entering 'third_party/fbgemm/external/cutlass' 2025-09-07T09:35:14.9828133Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9856267Z Entering 'third_party/fbgemm/external/googletest' 2025-09-07T09:35:14.9872692Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9890921Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-09-07T09:35:14.9911881Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9929267Z Entering 'third_party/fbgemm/external/json' 2025-09-07T09:35:14.9943583Z http.https://github.com/.extraheader 2025-09-07T09:35:14.9963288Z Entering 'third_party/flash-attention' 2025-09-07T09:35:14.9984568Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0005423Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T09:35:15.0034721Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0051888Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-09-07T09:35:15.0072493Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0101299Z Entering 'third_party/flatbuffers' 2025-09-07T09:35:15.0115681Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0135610Z Entering 'third_party/fmt' 2025-09-07T09:35:15.0152001Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0170006Z Entering 'third_party/gemmlowp/gemmlowp' 2025-09-07T09:35:15.0191471Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0209548Z Entering 'third_party/gloo' 2025-09-07T09:35:15.0232109Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0249752Z Entering 'third_party/googletest' 2025-09-07T09:35:15.0271238Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0292247Z Entering 'third_party/ideep' 2025-09-07T09:35:15.0313555Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0326658Z Entering 'third_party/ideep/mkl-dnn' 2025-09-07T09:35:15.0344814Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0366688Z Entering 'third_party/ittapi' 2025-09-07T09:35:15.0381379Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0399871Z Entering 'third_party/kineto' 2025-09-07T09:35:15.0415939Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0432449Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-09-07T09:35:15.0448369Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0473941Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-09-07T09:35:15.0492081Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0511827Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-09-07T09:35:15.0531897Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0550674Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-09-07T09:35:15.0564700Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0586677Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-09-07T09:35:15.0595168Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0610092Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-09-07T09:35:15.0623495Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0646079Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-09-07T09:35:15.0659252Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0682755Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-09-07T09:35:15.0696718Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0715038Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-09-07T09:35:15.0727059Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0746637Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-09-07T09:35:15.0763563Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0785243Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-09-07T09:35:15.0805542Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0824754Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-09-07T09:35:15.0845532Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0867364Z Entering 'third_party/kleidiai' 2025-09-07T09:35:15.0886741Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0907611Z Entering 'third_party/mimalloc' 2025-09-07T09:35:15.0926420Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0945095Z Entering 'third_party/nlohmann' 2025-09-07T09:35:15.0965343Z http.https://github.com/.extraheader 2025-09-07T09:35:15.0984478Z Entering 'third_party/onnx' 2025-09-07T09:35:15.0998844Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1024935Z Entering 'third_party/onnx/third_party/pybind11' 2025-09-07T09:35:15.1038788Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1060140Z Entering 'third_party/opentelemetry-cpp' 2025-09-07T09:35:15.1076935Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1102398Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-09-07T09:35:15.1122350Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1144609Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-09-07T09:35:15.1157853Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1182356Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-09-07T09:35:15.1202111Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1224388Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-09-07T09:35:15.1241118Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1254662Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-09-07T09:35:15.1268410Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1285032Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-09-07T09:35:15.1298801Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1318717Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-09-07T09:35:15.1332643Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1349667Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-09-07T09:35:15.1362700Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1392192Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-09-07T09:35:15.1405960Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1428676Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-09-07T09:35:15.1441041Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1476707Z Entering 'third_party/pocketfft' 2025-09-07T09:35:15.1491030Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1509339Z Entering 'third_party/protobuf' 2025-09-07T09:35:15.1523999Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1544368Z Entering 'third_party/protobuf/third_party/benchmark' 2025-09-07T09:35:15.1557124Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1577258Z Entering 'third_party/protobuf/third_party/googletest' 2025-09-07T09:35:15.1591287Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1614813Z Entering 'third_party/psimd' 2025-09-07T09:35:15.1628537Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1644628Z Entering 'third_party/pthreadpool' 2025-09-07T09:35:15.1662438Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1685149Z Entering 'third_party/pybind11' 2025-09-07T09:35:15.1711207Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1729532Z Entering 'third_party/python-peachpy' 2025-09-07T09:35:15.1743384Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1765759Z Entering 'third_party/sleef' 2025-09-07T09:35:15.1781035Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1798313Z Entering 'third_party/tensorpipe' 2025-09-07T09:35:15.1817611Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1845620Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-09-07T09:35:15.1860360Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1883083Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-09-07T09:35:15.1904708Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1932173Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-09-07T09:35:15.1952683Z http.https://github.com/.extraheader 2025-09-07T09:35:15.1970878Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-09-07T09:35:15.1991873Z http.https://github.com/.extraheader 2025-09-07T09:35:15.2009173Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-09-07T09:35:15.2021975Z http.https://github.com/.extraheader 2025-09-07T09:35:15.2185234Z Cleaning up orphan processes